Re: How to group on a group id that is present inside a complex hierarchy

Alan Gates Wed, 06 Apr 2011 06:52:34 -0700

Approach 2 should work except for a bug in the way flatten schemas arehandled (the bug will be fixed in 0.9 fwiw). If you specify theschema after the flatten I think it will work.


Change


at = foreach inputTuples generate flatten($0.$0#'stdout');

to

at = foreach inputTuples generate flatten($0.$0#'stdout') as (day,age, name, address, m);

The issue is that when a flatten doesn't have a schema it assigns aschema of bytearray instead of setting the schema to null so that itcan figure it out at runtime.


Alan.


On Apr 5, 2011, at 9:41 PM, deepak kumar v wrote:

gentle reminder
On Fri, Apr 1, 2011 at 10:08 AM, deepak kumar v<[email protected]> wrote:
any response?
On Tue, Mar 29, 2011 at 3:32 PM, deepak kumar v<[email protected]>wrote:
Hi,
Below are list of tuples generated by a UDF.

( ( [stdout#{ (day, age, name, address, ['k1#v1','k2#v2'] ) } ] ) )
( ( [stdout#{ (12/2,22,deepak,newyork,  ['k1#v2','k2#v2'] ) } ] ) )
( ( [stdout#{ (12/3,22,deepak,newyork,  ['k1#v1','k2#v2'] ) } ] ) )
group a -- ( v1 , { (day, age, name, address, ['k1#v1','k2#v2']
), (12/3,22,deepak,newjersy,  ['k1#v1','k2#v2']) } )
group b -- ( v2 , { (12/2,22,deepak,newyork,  ['k1#v2','k2#v2'])} )

I need to run group by on k1 so that i have two groups.
*
Approach #1*
grped = group inputTuples by $0.$0.#'stdout'.$0.$0.$5#'k1'

Error:
2011-03-29 15:16:44,589 [main] WARN  org.apache.pig.PigServer -
Encountered Warning IMPLICIT_CAST_TO_MAP 1 time(s).
2011-03-29 15:16:44,589 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used inthe script:
GROUP_BY
2011-03-29 15:16:44,589 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-03-29 15:16:44,593 [main] ERRORorg.apache.pig.tools.grunt.Grunt -ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false.
Details at logfile:
/home/deepakkv/pigtemp/testworkflow/pig_1301391996435.log

*Approach #2*
As a result i flattened inputTulpes as follows
flat = foreach inputTuples generate flatten($0.$0#'stdout');

(day, age, name, address, ['k1#v1','k2#v2'] )
(12/2,22,deepak,newyork,  ['k1#v2','k2#v2'] )
(12/3,22,deepak,newyork,  ['k1#v1','k2#v2'] )
So now as i need to group on k1 which is present in a map that isthe 5th
item (4 index) i
grped = group flat by $4#'k1';

Error
2011-03-29 15:25:28,459 [main] INFO org.apache.pig.Main - Loggingerrormessages to: /home/deepakkv/pigtemp/testworkflow/pig_1301392528456.log
2011-03-29 15:25:28,554 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting
to hadoop file system at: file:///
2011-03-29 15:25:28,750 [main] ERRORorg.apache.pig.tools.grunt.Grunt -ERROR 1000: Error during parsing. Out of bound access. Trying toaccess
non-existent column: 4. Schema {bytearray} has 1 column(s).
Details at logfile:
/home/deepakkv/pigtemp/testworkflow/pig_1301392528456.log

*Approach #3*
As i result i tried
grped = group flat by $0.$4#'k1';

Error:
2011-03-29 15:27:18,081 [Thread-13] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.pig.data.Tuple
       at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392)
       at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
       at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138)
       at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276)
       at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:916)
How can i group tuples on group id which is present inside a Tuple-> Bag
-> Map -> Tuple (Given key) -> 4thItem (Is a Map again) -> Key

Regards,
Deepak

Re: How to group on a group id that is present inside a complex hierarchy

Reply via email to