[ https://issues.apache.org/jira/browse/PIG-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335122#comment-14335122 ]
Anthony Hsu commented on PIG-4392: ---------------------------------- Seems like {{Job.getJob()}} only exists in Hadoop 2 but not Hadoop 1: * Hadoop 1: https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/jobcontrol/Job.html * Hadoop 2: https://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/mapred/jobcontrol/Job.html > RANK BY fails when default_parallel is greater than cardinality of field > being ranked by > ---------------------------------------------------------------------------------------- > > Key: PIG-4392 > URL: https://issues.apache.org/jira/browse/PIG-4392 > Project: Pig > Issue Type: Bug > Affects Versions: 0.11.1 > Reporter: Anthony Hsu > Assignee: Daniel Dai > Fix For: 0.15.0 > > Attachments: PIG-4392-1.patch, PIG-4392-2.patch > > > To reproduce: > {code:title=input.txt} > 1 2 3 > 4 5 6 > 7 8 9 > {code} > {code:title=rank.pig} > set default_parallel 4; > d = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int); > e = rank d by a; > dump e; > {code} > If {{default_parallel}} is set to {{3}}, the script succeeds. So I'm guessing > RANK BY has issues if the {{default_parallel}} exceeds the cardinality of the > field being ranked by. > I'm seeing this issue with Pig 0.11.1 (which has the PIG-2932 patch applied) > and Hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)