Use combiner in cogroup
-----------------------

                 Key: PIG-1735
                 URL: https://issues.apache.org/jira/browse/PIG-1735
             Project: Pig
          Issue Type: Improvement
            Reporter: Thejas M Nair
            Assignee: Thejas M Nair
             Fix For: 0.9.0


As reported by Scott Carey in PIG-479, combiner does not get used for co-group, 
even if the functions applied on the bags are algebraic . -
Quoting from the comment  - 
"For example, I'm not quite sure why this one doesn't use a combiner - it reads 
~350x as much input bytes from HDFS as its reduce output, a combiner would be 
very effective:

J = COGROUP
UV BY (s, d, h, g, p, pa, st) OUTER,
UC BY (s, d, h, g, p, pa, st) OUTER,
AT BY (s, d, h, g, p, pa, st) OUTER,
V BY (s, d, h, g, p, pa, st) OUTER,
C BY (s, d, h, g, p, pa, st) OUTER;

OUTPUT = FOREACH J GENERATE
FLATTEN(group) as (s, d, h, g, p, pa, st),
COUNT_STAR(C) as c,
COUNT_STAR(V) as v,
SUM(AT.p1) as p1,
SUM(AT.p2) as p2,
SUM(AT.p3) as p3,
SUM(UC.q) as ucq,
SUM(UC.r) as ucr,
SUM(UV.q) as uvq,
SUM(UV.r) as uvr;
"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to