Thejas Nair
Sat, 07 Nov 2009 15:54:27 -0800
Hi Kevin,
The inputs parameters to the udf are wrapped inside a tuple and then given
as input to the execu function in the udf.
In case of -
>> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
The exec function gets a Tuple with one column which is a
tuple(long,long,long)
ie in exec(Tuple input), input.get(0) will return tuple(long,long,long) .
On the other hand if you called the udf this way -
>> grunt> C = FOREACH A GENERATE UDF.SumAll((long)a1,(chararray)a2);
in exec(Tuple input), input.get(0) will return long, input.get(1) will
return chararray.
I hope this answers you question.
Thanks,
Thejas
On 11/5/09 9:15 PM, "Kelvin Moss" <km_jr_use...@yahoo.com> wrote:
>
> Thanks for the reply. I understand that Tuple can have more than one field.
> That is why I was expecting Tuple.getAll to return me all the fields in the
> Tuple. But as it turns out it returns a Tuple. That made me think that may be
> Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not
> valid, right?
>
> ((1,2,3),(4,5,6))
>
> It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am
> confusing things?
>
> Thanks!
>
> --- On Thu, 11/5/09, Jeff Zhang <zjf...@gmail.com> wrote:
>
>
> From: Jeff Zhang <zjf...@gmail.com>
> Subject: Re: Accessing fields in Tuple
> To: pig-user@hadoop.apache.org
> Date: Thursday, November 5, 2009, 7:44 PM
>
>
> The input is the arguments you provide to your UDF. It is tuple type. Tuple
> can have more than more than one element. That means your UDF can have more
> than one argument. Here you provide one argument which is tuple type to
> your UDF.
> So that means the first element of input is a tuple.
>
>
> Jeff Zhang
>
>
> On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km_jr_use...@yahoo.com> wrote:
>
>> Hi all,
>>
>> I have the follwoing data file
>>
>> (1L,2L,3L)
>> (4L,2L,1L)
>> (8L,3L,4L)
>>
>> I am trying to write a UDF (like sum) that would add the fields in Tuple.
>> This works --
>>
>> public class SumAll extends EvalFunc<Long> {
>> public Long exec(Tuple input) {
>> try {
>> return sum(input);
>> } catch (NumberFormatException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> } catch (ExecException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>> return 0L;
>> }
>>
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>> long sum = 0;
>>
>> List<Object> values = input.getAll();
>> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>> Tuple t = (Tuple)it.next();
>> sum += (Long)t.get(0);
>> sum += (Long)t.get(1);
>> sum += (Long)t.get(2);
>> }
>> return sum;
>> }
>>
>> }
>>
>> grunt> A = LOAD 'data2' as aa:bytearray;
>> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
>> grunt> dump C;
>> 2009-11-05 10:07:09,266 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
>> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
>> written : 3
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
>> written : 0
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
>> complete!
>> 2009-11-05 10:07:09,267 [main] INFO
>> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
>> (6L)
>> (7L)
>> (15L)
>> grunt>
>>
>> Initially I thought that such a loop would work
>>
>> static protected Long sum(Tuple input) throws ExecException,
>> NumberFormatException {
>> long sum = 0;
>>
>> List<Object> values = input.getAll(); // Would give all fields in Tuple??
>> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>> sum += (Long)t;
>> }
>> return sum;
>> }
>>
>> But I get an error that Tuple can't be cast back to Long. So my question is
>> that what is input.getAll() returning? What is the structure of data that
>> gets passed to exec function?
>>
>> Thanks!
>>
>>
>>
>
>
>