pig-user  

Re: Accessing fields in Tuple

Kelvin Moss
Thu, 05 Nov 2009 20:16:32 -0800

 
Thanks for the reply. I understand that Tuple can have more than one field. 
That is why I was expecting Tuple.getAll to return me all the fields in the 
Tuple. But as it turns out it returns a Tuple.  That made me think that may be 
Tuple.getAll returns all the Tuples in the Tuple, but a Tuple like this is not 
valid, right?
 
((1,2,3),(4,5,6))
 
It should be enlosed in a bag like {(1,2,3),(4,5,6)}. Or, may be I am confusing 
things? 
 
Thanks!

--- On Thu, 11/5/09, Jeff Zhang <zjf...@gmail.com> wrote:


From: Jeff Zhang <zjf...@gmail.com>
Subject: Re: Accessing fields in Tuple
To: pig-user@hadoop.apache.org
Date: Thursday, November 5, 2009, 7:44 PM


The input is the arguments you provide to your UDF. It is tuple type.  Tuple
can have more than more than one element. That means your UDF can have more
than one argument.  Here you provide one argument which is tuple type to
your UDF.
So that means the first element of input is a tuple.


Jeff Zhang


On Thu, Nov 5, 2009 at 2:23 AM, Kelvin Moss <km_jr_use...@yahoo.com> wrote:

> Hi all,
>
> I have the follwoing data file
>
> (1L,2L,3L)
> (4L,2L,1L)
> (8L,3L,4L)
>
> I am trying to write a UDF (like sum) that would add the fields in Tuple.
> This works --
>
> public class SumAll extends EvalFunc<Long> {
> public Long exec(Tuple input) {
> try {
> return sum(input);
> } catch (NumberFormatException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> } catch (ExecException e) {
> // TODO Auto-generated catch block
> e.printStackTrace();
> }
> return 0L;
> }
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
>      long sum = 0;
>
>      List<Object> values = input.getAll();
>      for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>          Tuple t = (Tuple)it.next();
>          sum += (Long)t.get(0);
>          sum += (Long)t.get(1);
>          sum += (Long)t.get(2);
>       }
>       return sum;
> }
>
> }
>
> grunt> A = LOAD 'data2' as aa:bytearray;
> grunt> C = FOREACH A GENERATE UDF.SumAll((tuple(long,long,long))aa);
> grunt> dump C;
> 2009-11-05 10:07:09,266 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully
> stored result in: "file:/tmp/temp1206478472/tmp-577036369"
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records
> written : 3
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes
> written : 0
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100%
> complete!
> 2009-11-05 10:07:09,267 [main] INFO
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (6L)
> (7L)
> (15L)
> grunt>
>
> Initially I thought that such a loop would work
>
> static protected Long sum(Tuple input) throws ExecException,
> NumberFormatException {
> long sum = 0;
>
> List<Object> values = input.getAll(); // Would give all fields in Tuple??
> for (Iterator<Object> it = values.iterator(); it.hasNext();) {
>     sum += (Long)t;
> }
> return sum;
> }
>
> But I get an error that Tuple can't be cast back to Long. So my question is
> that what is input.getAll() returning? What is the structure of data that
> gets passed to exec function?
>
> Thanks!
>
>
>