[ 
https://issues.apache.org/jira/browse/PIG-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4536:
------------------------------------
    Description: 
{code}
data_group = GROUP A BY (f1, f2) PARALLEL 100;
group_result = FOREACH data_group {
B = LIMIT A.f3 1;
GENERATE group,  SUM(A.f3), SUM(A.f4), SUM(A.f5), SUM(A.f6),FLATTEN(B);
};
{code}
A script like this has combiner optimization turned off and so consumes a lot 
of memory and is slow. We should implement LIMIT using Combiner in cases like 
this.

  was:
data_group = GROUP A BY (f1, f2) PARALLEL 100;
group_result = FOREACH data_group {
B = LIMIT A.f3 1;

GENERATE group,  
SUM(A.f3),
SUM(A.f4),
SUM(A.f5),
SUM(A.f6),
FLATTEN(B);
};

A script like this has combiner optimization turned off and so consumes a lot 
of memory and is slow. We should implement LIMIT using Combiner in cases like 
this.


> LIMIT inside nested foreach should have combiner optimization
> -------------------------------------------------------------
>
>                 Key: PIG-4536
>                 URL: https://issues.apache.org/jira/browse/PIG-4536
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>              Labels: Performance
>
> {code}
> data_group = GROUP A BY (f1, f2) PARALLEL 100;
> group_result = FOREACH data_group {
> B = LIMIT A.f3 1;
> GENERATE group,  SUM(A.f3), SUM(A.f4), SUM(A.f5), SUM(A.f6),FLATTEN(B);
> };
> {code}
> A script like this has combiner optimization turned off and so consumes a lot 
> of memory and is slow. We should implement LIMIT using Combiner in cases like 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to