Rohini Palaniswamy created PIG-4536:
---------------------------------------
Summary: LIMIT inside nested foreach should have combiner
optimization
Key: PIG-4536
URL: https://issues.apache.org/jira/browse/PIG-4536
Project: Pig
Issue Type: Improvement
Reporter: Rohini Palaniswamy
data_group = GROUP A BY (f1, f2) PARALLEL 100;
group_result = FOREACH data_group {
B = LIMIT A.f3 1;
GENERATE group,
SUM(A.f3),
SUM(A.f4),
SUM(A.f5),
SUM(A.f6),
FLATTEN(B);
};
A script like this has combiner optimization turned off and so consumes a lot
of memory and is slow. We should implement LIMIT using Combiner in cases like
this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)