Hi Daniel,

I have a bag of tuples
inputBag =
{ (day, age, name, address,  ['k1#v1','k2#v2']),
(12/2,22,deepak,newyork,  ['k1#v1','k2#v2']),
(12/3,22,deepak,newjersy,  ['k1#v1','k2#v2'])
}

I need to invoke a UDF for each tuple, so i have to flatten the bag which i
do as

flatTuples = foreach inputBag generate FLATTEN($0)

Now i get a list of tuples
(day, age, name, address,  ['k1#v1','k2#v2']),
(12/2,22,deepak,newyork,  ['k1#v1','k2#v2']),
(12/3,22,deepak,newjersy,  ['k1#v1','k2#v2'])

I tried few options to invoke my UDF for each tuple

1)
processed = foreach flatTuple generate com.myUDF.UDF($0).
I am expecting $0 will point to entire tuple  (day, age, name, address,
 ['k1#v1','k2#v2']), but
$0 within my UDF returns only (day, age, name, address), *For some unknown
reason the map is not passed into UDF.*

2)
As option 1 did not work, i assumed that $0 points to item0 , $1 points to
item1 and so on of the flattened tuples. As a result
processed = foreach flatTuple generate com.myUDF.UDF($0, $1, $2, $3, $4).
would pass each of the item of input tuple into the UDF, But this threw the
following error
$1 i get "Out of bound access. Trying to access non-existent column: 1.
Schema {bytearray} has 1 column(s)",

3)
As above option did not worked i tried
processed = foreach flatTuple generate com.myUDF.UDF($0.$0,
$0.$1, $0.$2, $0.$3, $0.$4).
Assuming $0 would point to input tuple and each of $0, $1... would point to
individual items in the the tuple
But this threw the following error
$0.$1 throws java.lang.ClassCastException: java.lang.String cannot be cast
to org.apache.pig.data.Tuple
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:389)

Regards,
Deepak


item 0-3 are of type char array and item4 is a map.

I iterate through these tuples

On Fri, Mar 18, 2011 at 8:48 AM, Daniel Dai <[email protected]> wrote:

> Hi, Deepak,
> Can you be more specific? I did some simple test and cannot reproduce. What
> is your query? UDF?
>
> Daniel
>
>
> On 03/16/2011 11:24 PM, deepak kumar v wrote:
>
>> Hi,
>> Below are list of tuples generated after flattening a bag .
>>
>> (day, age, name, address,  ['k1#v1','k2#v2']),
>> (12/2,22,deepak,newyork,  ['k1#v1','k2#v2']),
>> (12/3,22,deepak,newjersy,  ['k1#v1','k2#v2'])
>>
>> process = foreach inputs generate com.myUDF.UDF($0);
>> Here $0 some how gets only (day, age, name, address) and the map is
>> skipped.
>> *How can i access the map? *
>> With
>> $1 i get "Out of bound access. Trying to access non-existent column: 1.
>> Schema {bytearray} has 1 column(s)",
>> $0.$1 throws java.lang.ClassCastException: java.lang.String cannot be cast
>> to org.apache.pig.data.Tuple
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:389)
>>
>> Also,
>> With
>> tuples = foreach flattenedTuples generate $0
>> generates
>> (day, age, name, address),
>> (12/2,22,deepak,newyork),
>> (12/3,22,deepak,newjersy)
>>
>> After flatenning if i dump, i see the map in the resultant tuples, but $0
>> instead referring to entire tuple, referes only to data part (map skipped)
>> Regards,
>> Deepak
>>
>
>

Reply via email to