I'm realizing that I need to do this constantly, otherwise I can't make much of anything. I used to do this, I think, maybe Pig let it slide.
On Mon, Jul 23, 2012 at 2:48 PM, Russell Jurney <russell.jur...@gmail.com>wrote: > Thanks, that was my thinking. If I make an alias and self-JOIN to it, it > should work. Self-joins this way are really powerful. > > > On Mon, Jul 23, 2012 at 2:36 PM, Sean Timm <tim...@aol.com> wrote: > >> It seem the self join should work in Pig 0.10 if using an alias, but alas >> it doesn't. See Jira PIG-2630. https://issues.apache.org/** >> jira/browse/PIG-2630 <https://issues.apache.org/jira/browse/PIG-2630> >> >> -Sean >> >> >> On 7/20/2012 12:01 PM, Alan Gates wrote: >> >>> It isn't a bug that you need to declare the join twice in your script. >>> That is necessary for clarity and semantic correctness. That is, if we >>> allowed: >>> >>> A = load 'bla'; >>> B = join A by user, A by user; >>> >>> then you'd have two user fields in the B with no way to disambiguate. >>> What's a bug (or missed optimization opportunity) is that we actually >>> double read and shuffle the data. We could optimize here and only read >>> shuffle one copy and then do the join in the reduce. >>> >>> Alan. >>> >>> On Jul 20, 2012, at 12:53 AM, Dmitriy Ryaboy wrote: >>> >>> It's kind if a waste of io and mappers. If not a bug, it's an >>>> optimization opportunity. >>>> >>>> On Jul 19, 2012, at 10:34 PM, Bill Graham <billgra...@gmail.com> wrote: >>>> >>>> No, it isn't a bug as I see it. You need to load the two relations >>>>> separately because a join is across two separate data sources. >>>>> >>>>> >>>>> On Thu, Jul 19, 2012 at 10:10 PM, Russell Jurney >>>>> <russell.jur...@gmail.com>**wrote: >>>>> >>>>> So it is a bug? Because Pig will not let me self JOIN. I have to LOAD >>>>>> the >>>>>> data twice. >>>>>> >>>>>> On Thu, Jul 19, 2012 at 9:49 PM, Bill Graham <billgra...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> No, to Pig a self join is just like a regular join across two >>>>>>> different >>>>>>> relations. It just happens to be to the same input data. >>>>>>> >>>>>>> On Thu, Jul 19, 2012 at 8:39 PM, Russell Jurney < >>>>>>> russell.jur...@gmail.com >>>>>>> >>>>>>>> wrote: >>>>>>>> Is this a bug? >>>>>>>> >>>>>>>> On Thu, Jul 19, 2012 at 8:00 PM, Robert Yerex < >>>>>>>> robert.yerex@civitaslearning.**com<robert.ye...@civitaslearning.com>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> The only way to get it to work is to load a second copy. >>>>>>>>> >>>>>>>>> On Thu, Jul 19, 2012 at 7:46 PM, Russell Jurney < >>>>>>>>> >>>>>>>> russell.jur...@gmail.com >>>>>>>> >>>>>>>>> wrote: >>>>>>>>>> Note: this works if I LOAD a new, 2nd relation and do the join. >>>>>>>>>> >>>>>>>>>> On Thu, Jul 19, 2012 at 7:34 PM, Russell Jurney < >>>>>>>>>> >>>>>>>>> russell.jur...@gmail.com >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>> I have a problem where I can't join a relation to itself on a >>>>>>>>>>> >>>>>>>>>> different >>>>>>>> >>>>>>>>> field. >>>>>>>>>>> >>>>>>>>>>> describe pairs >>>>>>>>>>> pairs: {from: chararray,to: chararray,message_id: >>>>>>>>>>> >>>>>>>>>> chararray,in_reply_to: >>>>>>>>> >>>>>>>>>> chararray} >>>>>>>>>>> >>>>>>>>>>> pairs2 = pairs; >>>>>>>>>>> >>>>>>>>>>> with_reply = join pairs by in_reply_to, pairs2 by message_id; >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I get this error: >>>>>>>>>>> >>>>>>>>>>> 2012-07-19 19:31:16,927 [main] ERROR >>>>>>>>>>> >>>>>>>>>> org.apache.pig.tools.grunt.**Grunt - >>>>>>>> >>>>>>>>> ERROR 1200: Pig script failed to parse: >>>>>>>>>>> <line 20, column 6> pig script failed to validate: >>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR >>>>>>>>>>> 2225: >>>>>>>>>>> >>>>>>>>>> Projection >>>>>>>>>> >>>>>>>>>>> with nothing to reference! >>>>>>>>>>> 2012-07-19 19:31:16,928 [main] ERROR >>>>>>>>>>> >>>>>>>>>> org.apache.pig.tools.grunt.**Grunt - >>>>>>>> >>>>>>>>> Failed to parse: Pig script failed to parse: >>>>>>>>>>> <line 20, column 6> pig script failed to validate: >>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR >>>>>>>>>>> 2225: >>>>>>>>>>> >>>>>>>>>> Projection >>>>>>>>>> >>>>>>>>>>> with nothing to reference! >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**QueryParserDriver.parse(** >>>>>>> QueryParserDriver.java:182) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>> org.apache.pig.PigServer$**Graph.validateQuery(PigServer.** >>>>>>> java:1565) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>> org.apache.pig.PigServer$**Graph.registerQuery(PigServer.** >>>>>>> java:1538) >>>>>>> >>>>>>>> at org.apache.pig.PigServer.**registerQuery(PigServer.java:**540) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>> org.apache.pig.tools.grunt.**GruntParser.processPig(** >>>>>>> GruntParser.java:970) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.tools.**pigscript.parser.** >>>>>>> PigScriptParser.parse(**PigScriptParser.java:386) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(** >>>>>>> GruntParser.java:189) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.tools.grunt.**GruntParser.parseStopOnError(** >>>>>>> GruntParser.java:165) >>>>>>> >>>>>>>> at org.apache.pig.tools.grunt.**Grunt.run(Grunt.java:69) >>>>>>>>>>> at org.apache.pig.Main.run(Main.**java:490) >>>>>>>>>>> at org.apache.pig.Main.main(Main.**java:111) >>>>>>>>>>> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native >>>>>>>>>>> Method) >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> sun.reflect.**NativeMethodAccessorImpl.**invoke(** >>>>>>> NativeMethodAccessorImpl.java:**39) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** >>>>>>> DelegatingMethodAccessorImpl.**java:25) >>>>>>> >>>>>>>> at java.lang.reflect.Method.**invoke(Method.java:597) >>>>>>>>>>> at org.apache.hadoop.util.RunJar.**main(RunJar.java:156) >>>>>>>>>>> Caused by: >>>>>>>>>>> <line 20, column 6> pig script failed to validate: >>>>>>>>>>> org.apache.pig.impl.**logicalLayer.**FrontendException: ERROR >>>>>>>>>>> 2225: >>>>>>>>>>> >>>>>>>>>> Projection >>>>>>>>>> >>>>>>>>>>> with nothing to reference! >>>>>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanBuilder.**buildJoinOp(** >>>>>>> LogicalPlanBuilder.java:363) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.join_** >>>>>>> clause(LogicalPlanGenerator.**java:11354) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.op_** >>>>>>> clause(LogicalPlanGenerator.**java:1489) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.general_** >>>>>>> statement(**LogicalPlanGenerator.java:789) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.**statement(** >>>>>>> LogicalPlanGenerator.java:507) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**LogicalPlanGenerator.query(** >>>>>>> LogicalPlanGenerator.java:382) >>>>>>> >>>>>>>> at >>>>>>>>>>> >>>>>>>>>>> org.apache.pig.parser.**QueryParserDriver.parse(** >>>>>>> QueryParserDriver.java:175) >>>>>>> >>>>>>>> ... 15 more >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> What am I to do? >>>>>>>>>>> -- >>>>>>>>>>> Russell Jurney >>>>>>>>>>> >>>>>>>>>> twitter.com/rjurneyrussell.**jurney@gmail.comdatasyndrome<http://twitter.com/rjurneyrussell.jurney@gmail.comdatasyndrome> >>>>>>>> . >>>>>>>> >>>>>>>>> com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com >>>>>>>>>> datasyndrome.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Robert Yerex >>>>>>>>> Data Scientist >>>>>>>>> Civitas Learning >>>>>>>>> www.civitaslearning.com >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com >>>>>>>> datasyndrome.com >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Note that I'm no longer using my Yahoo! email address. Please email >>>>>>> me at >>>>>>> billgra...@gmail.com going forward.* >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.comdatasyndrome. >>>>>> com >>>>>> >>>>>> >>>>> >>>>> -- >>>>> *Note that I'm no longer using my Yahoo! email address. Please email >>>>> me at >>>>> billgra...@gmail.com going forward.* >>>>> >>>> >> > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome. > com > -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com