Anthony Hsu created PIG-4392:
--------------------------------
Summary: RANK BY fails when default_parallel is greater than
cardinality of field being ranked by
Key: PIG-4392
URL: https://issues.apache.org/jira/browse/PIG-4392
Project: Pig
Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Anthony Hsu
To reproduce:
{code:title=input.txt}
1 2 3
4 5 6
7 8 9
{code}
{code:title=rank.pig}
set default_parallel 4;
d = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int);
e = rank d by a;
dump e;
{code}
If {{default_parallel}} is set to {{3}}, the script succeeds. So I'm guessing
RANK BY has issues if the {{default_parallel}} exceeds the cardinality of the
field being ranked by.
I'm seeing this issue with Pig 0.11.1 (which has the PIG-2932 patch applied)
and Hadoop 2.3.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)