Use combiner in cogroup
-----------------------
Key: PIG-1735
URL: https://issues.apache.org/jira/browse/PIG-1735
Project: Pig
Issue Type: Improvement
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.9.0
As reported by Scott Carey in PIG-479, combiner does not get used for co-group,
even if the functions applied on the bags are algebraic . -
Quoting from the comment -
"For example, I'm not quite sure why this one doesn't use a combiner - it reads
~350x as much input bytes from HDFS as its reduce output, a combiner would be
very effective:
J = COGROUP
UV BY (s, d, h, g, p, pa, st) OUTER,
UC BY (s, d, h, g, p, pa, st) OUTER,
AT BY (s, d, h, g, p, pa, st) OUTER,
V BY (s, d, h, g, p, pa, st) OUTER,
C BY (s, d, h, g, p, pa, st) OUTER;
OUTPUT = FOREACH J GENERATE
FLATTEN(group) as (s, d, h, g, p, pa, st),
COUNT_STAR(C) as c,
COUNT_STAR(V) as v,
SUM(AT.p1) as p1,
SUM(AT.p2) as p2,
SUM(AT.p3) as p3,
SUM(UC.q) as ucq,
SUM(UC.r) as ucr,
SUM(UV.q) as uvq,
SUM(UV.r) as uvr;
"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.