gentle reminder On Fri, Apr 1, 2011 at 10:08 AM, deepak kumar v <[email protected]> wrote:
> any response? > > > On Tue, Mar 29, 2011 at 3:32 PM, deepak kumar v <[email protected]>wrote: > >> Hi, >> Below are list of tuples generated by a UDF. >> >> ( ( [stdout#{ (day, age, name, address, ['k1#v1','k2#v2'] ) } ] ) ) >> ( ( [stdout#{ (12/2,22,deepak,newyork, ['k1#v2','k2#v2'] ) } ] ) ) >> ( ( [stdout#{ (12/3,22,deepak,newyork, ['k1#v1','k2#v2'] ) } ] ) ) >> group a -- ( v1 , { (day, age, name, address, ['k1#v1','k2#v2'] >> ), (12/3,22,deepak,newjersy, ['k1#v1','k2#v2']) } ) >> group b -- ( v2 , { (12/2,22,deepak,newyork, ['k1#v2','k2#v2'])} ) >> >> I need to run group by on k1 so that i have two groups. >> * >> Approach #1* >> grped = group inputTuples by $0.$0.#'stdout'.$0.$0.$5#'k1' >> >> Error: >> 2011-03-29 15:16:44,589 [main] WARN org.apache.pig.PigServer - >> Encountered Warning IMPLICIT_CAST_TO_MAP 1 time(s). >> 2011-03-29 15:16:44,589 [main] INFO >> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: >> GROUP_BY >> 2011-03-29 15:16:44,589 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - >> pig.usenewlogicalplan is set to true. New logical plan will be used. >> 2011-03-29 15:16:44,593 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. >> Details at logfile: >> /home/deepakkv/pigtemp/testworkflow/pig_1301391996435.log >> >> *Approach #2* >> As a result i flattened inputTulpes as follows >> flat = foreach inputTuples generate flatten($0.$0#'stdout'); >> >> (day, age, name, address, ['k1#v1','k2#v2'] ) >> (12/2,22,deepak,newyork, ['k1#v2','k2#v2'] ) >> (12/3,22,deepak,newyork, ['k1#v1','k2#v2'] ) >> >> So now as i need to group on k1 which is present in a map that is the 5th >> item (4 index) i >> grped = group flat by $4#'k1'; >> >> Error >> 2011-03-29 15:25:28,459 [main] INFO org.apache.pig.Main - Logging error >> messages to: /home/deepakkv/pigtemp/testworkflow/pig_1301392528456.log >> 2011-03-29 15:25:28,554 [main] INFO >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting >> to hadoop file system at: file:/// >> 2011-03-29 15:25:28,750 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 1000: Error during parsing. Out of bound access. Trying to access >> non-existent column: 4. Schema {bytearray} has 1 column(s). >> Details at logfile: >> /home/deepakkv/pigtemp/testworkflow/pig_1301392528456.log >> >> *Approach #3* >> As i result i tried >> grped = group flat by $0.$4#'k1'; >> >> Error: >> 2011-03-29 15:27:18,081 [Thread-13] WARN >> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 >> java.lang.ClassCastException: java.lang.String cannot be cast to >> org.apache.pig.data.Tuple >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:138) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:276) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:916) >> >> >> >> How can i group tuples on group id which is present inside a Tuple -> Bag >> -> Map -> Tuple (Given key) -> 4thItem (Is a Map again) -> Key >> >> Regards, >> Deepak >> > >
