I'm realizing that I need to do this constantly, otherwise I can't make much of anything. I used to do this, I think, maybe Pig let it slide.
On Mon, Jul 23, 2012 at 2:48 PM, Russell Jurney <[email protected]>wrote: > Thanks, that was my thinking. If I make an alias and self-JOIN to it, it > should work. Self-joins this way are really powerful. > > > On Mon, Jul 23, 2012 at 2:36 PM, Sean Timm <[email protected]> wrote: > >> It seem the self join should work in Pig 0.10 if using an alias, but alas >> it doesn't. See Jira PIG-2630. https://issues.apache.org/** >> jira/browse/PIG-2630 <https://issues.apache.org/jira/browse/PIG-2630> >> >> -Sean >> >> >> On 7/20/2012 12:01 PM, Alan Gates wrote: >> >>> It isn't a bug that you need to declare the join twice in your script. >>> That is necessary for clarity and semantic correctness. That is, if we >>> allowed: >>> >>> A = load 'bla'; >>> B = join A by user, A by user; >>> >>> then you'd have two user fields in the B with no way to disambiguate. >>> What's a bug (or missed optimization opportunity) is that we actually >>> double read and shuffle the data. We could optimize here and only read >>> shuffle one copy and then do the join in the reduce. >>> >>> Alan. >>> >>> On Jul 20, 2012, at 12:53 AM, Dmitriy Ryaboy wrote: >>> >>> It's kind if a waste of io and mappers. If not a bug, it's an >>>> optimization opportunity. >>>> >>>> On Jul 19, 2012, at 10:34 PM, Bill Graham <[email protected]> wrote: >>>> >>>> No, it isn't a bug as I see it. You need to load the two relations >>>>> separately because a join is across two separate data sources. >>>>> >>>>> >>>>> On Thu, Jul 19, 2012 at 10:10 PM, Russell Jurney >>>>> <[email protected]>**wrote: >>>>> >>>>> So it is a bug? Because Pig will not let me self JOIN. I have to LOAD >>>>>> the >>>>>> data twice. >>>>>> >>>>>> On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <[email protected]> >>>>>> wrote: >>>>>> >>>>>> No, to Pig a self join is just like a regular join across two >>>>>>> different >>>>>>> relations. It just happens to be to the same input data. >>>>>>> >>>>>>> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney < >>>>>>> [email protected] >>>>>>> >>>>>>>> wrote: >>>>>>>> Is this a bug? >>>>>>>> >>>>>>>> On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex < >>>>>>>> robert.yerex@civitaslearning.**com<[email protected]>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> The only way to get it to work is to load a second copy. >>>>>>>>> >>>>>>>>> On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney < >>>>>>>>> >>>>>>>> [email protected] >>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> Note: this works if I LOAD a new, 2nd relation and do the join. >>>>>>>>>> >>>>>>>>>> On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney < >>>>>>>>>> >>>>>>>>> [email protected] >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> I have a problem where I can't join a relation to itself on a >>>>>>>>>>> >>>>>>>>>> different >>>>>>>> >>>>>>>>> field. >>>>>>>>>>> >>>>>>>>>>> describe pairs >>>>>>>>>>> pairs: {from: chararray,to: chararray,message_id: >>>>>>>>>>> >>>>>>>>>> chararray,in_reply_to: >>>>>>>>> >>>>>>>>>> chararray} >>>>>>>>>>> >>>>>>>>>>> pairs2 = pairs; >>>>>>>>>>> >>>>>>>>>>> with_reply = join pairs by in_reply_to, pairs2 by message_id; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I get this error: >>>>>>>>>>> >>>>>>>>>>> 2012-07-19 19:31:16,927 [main] ERROR >>>>>>>>>>> >>>>>>>>>> org.apache.pig.tools.grunt.**Grunt - >>>>>>>> >>>>>>>>> ERROR 1200: Pig script failed to parse: >>>>>>>>>>> <line 20, column 6> pig script failed to validate: >>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR >>>>>>>>>>> 2225: >>>>>>>>>>> >>>>>>>>>> Projection >>>>>>>>>> >>>>>>>>>>> with nothing to reference! >>>>>>>>>>> 2012-07-19 19:31:16,928 [main] ERROR >>>>>>>>>>> >>>>>>>>>> org.apache.pig.tools.grunt.**Grunt - >>>>>>>> >>>>>>>>> Failed to parse: Pig script failed to parse: >>>>>>>>>>> <line 20, column 6> pig script failed to validate: >>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR >>>>>>>>>>> 2225: >>>>>>>>>>> >>>>>>>>>> Projection >>>>>>>>>> >>>>>>>>>>> with nothing to reference! >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**QueryParserDriver.parse(** >>>>>>> QueryParserDriver.java:182) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>> org.apache.pig.PigServer$**Graph.validateQuery(PigServer.** >>>>>>> java:1565) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>> org.apache.pig.PigServer$**Graph.registerQuery(PigServer.** >>>>>>> java:1538) >>>>>>> >>>>>>>> at org.apache.pig.PigServer.**registerQuery(PigServer.java:**540) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>> org.apache.pig.tools.grunt.**GruntParser.processPig(** >>>>>>> GruntParser.java:970) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.tools.**pigscript.parser.** >>>>>>> PigScriptParser.parse(**PigScriptParser.java:386) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(** >>>>>>> GruntParser.java:189) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(** >>>>>>> GruntParser.java:165) >>>>>>> >>>>>>>> at org.apache.pig.tools.grunt.**Grunt.run(Grunt.java:69) >>>>>>>>>>> at org.apache.pig.Main.run(Main.**java:490) >>>>>>>>>>> at org.apache.pig.Main.main(Main.**java:111) >>>>>>>>>>> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native >>>>>>>>>>> Method) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> sun.reflect.**NativeMethodAccessorImpl.**invoke(** >>>>>>> NativeMethodAccessorImpl.java:**39) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** >>>>>>> DelegatingMethodAccessorImpl.**java:25) >>>>>>> >>>>>>>> at java.lang.reflect.Method.**invoke(Method.java:597) >>>>>>>>>>> at org.apache.hadoop.util.RunJar.**main(RunJar.java:156) >>>>>>>>>>> Caused by: >>>>>>>>>>> <line 20, column 6> pig script failed to validate: >>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR >>>>>>>>>>> 2225: >>>>>>>>>>> >>>>>>>>>> Projection >>>>>>>>>> >>>>>>>>>>> with nothing to reference! >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanBuilder.**buildJoinOp(** >>>>>>> LogicalPlanBuilder.java:363) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.join_** >>>>>>> clause(LogicalPlanGenerator.**java:11354) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.op_** >>>>>>> clause(LogicalPlanGenerator.**java:1489) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.general_** >>>>>>> statement(**LogicalPlanGenerator.java:789) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.**statement(** >>>>>>> LogicalPlanGenerator.java:507) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.query(** >>>>>>> LogicalPlanGenerator.java:382) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**QueryParserDriver.parse(** >>>>>>> QueryParserDriver.java:175) >>>>>>> >>>>>>>> ... 15 more >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> What am I to do? >>>>>>>>>>> -- >>>>>>>>>>> Russell Jurney >>>>>>>>>>> >>>>>>>>>> twitter.com/rjurneyrussell.**[email protected]<http://twitter.com/[email protected]> >>>>>>>> . >>>>>>>> >>>>>>>>> com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Russell Jurney twitter.com/rjurney [email protected] >>>>>>>>>> datasyndrome.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Robert Yerex >>>>>>>>> Data Scientist >>>>>>>>> Civitas Learning >>>>>>>>> www.civitaslearning.com >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Russell Jurney twitter.com/rjurney [email protected] >>>>>>>> datasyndrome.com >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Note that I'm no longer using my Yahoo! email address. Please email >>>>>>> me at >>>>>>> [email protected] going forward.* >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Russell Jurney twitter.com/rjurney [email protected]. >>>>>> com >>>>>> >>>>>> >>>>> >>>>> -- >>>>> *Note that I'm no longer using my Yahoo! email address. Please email >>>>> me at >>>>> [email protected] going forward.* >>>>> >>>> >> > > > -- > Russell Jurney twitter.com/rjurney [email protected] datasyndrome. > com > -- Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
