[
https://issues.apache.org/jira/browse/PIG-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xianda Ke updated PIG-4842:
---------------------------
Description:
Scenario:
1. input data:
cat collectedgroup1
1
1
2
2. pig script:
A = LOAD 'collectedgroup1' USING myudfs.DummyCollectableLoader() AS (id);
B = GROUP A by $0 USING 'collected';
C = GROUP B by $0 USING 'collected';
DUMP C;
The expected output:
{code}
(1,{(1,{(1),(1)})})
(2,{(2,{(2)})})
{code}
The actual output:
{code}
(1,{(1,{(1),(1)})})
(1,)
(2,{(2,{(2)})})
{code}
was:
Scenario:
1. input data:
cat collectedgroup1
1
1
2
2. pig script:
A = LOAD 'collectedgroup1' USING myudfs.DummyCollectableLoader() AS (id);
B = GROUP A by $0 USING 'collected';
C = GROUP B by $0 USING 'collected';
DUMP C;
The expected output:
(1,{(1,{(1),(1)})})
(2,{(2,{(2)})})
The actual output:
(1,{(1,{(1),(1)})})
(1,)
(2,{(2,{(2)})})
> Collected group doesn't work in some cases
> ------------------------------------------
>
> Key: PIG-4842
> URL: https://issues.apache.org/jira/browse/PIG-4842
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Xianda Ke
> Assignee: Xianda Ke
> Fix For: spark-branch
>
>
> Scenario:
> 1. input data:
> cat collectedgroup1
> 1
> 1
> 2
> 2. pig script:
> A = LOAD 'collectedgroup1' USING myudfs.DummyCollectableLoader() AS (id);
> B = GROUP A by $0 USING 'collected';
> C = GROUP B by $0 USING 'collected';
> DUMP C;
> The expected output:
> {code}
> (1,{(1,{(1),(1)})})
> (2,{(2,{(2)})})
> {code}
> The actual output:
> {code}
> (1,{(1,{(1),(1)})})
> (1,)
> (2,{(2,{(2)})})
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)