Alex Bain created PIG-3562:
------------------------------

             Summary: Implement combiner optimizations for DISTINCT
                 Key: PIG-3562
                 URL: https://issues.apache.org/jira/browse/PIG-3562
             Project: Pig
          Issue Type: Sub-task
          Components: tez
    Affects Versions: tez-branch
            Reporter: Alex Bain
            Assignee: Alex Bain


Currently, DISTINCT is implemented in a straightforward manner per 
https://issues.apache.org/jira/browse/PIG-3538.

However, we can implement two types of combiner optimizations for DISTINCT, 
just as the MRCompiler does for map-reduce:
1. A simple DistinctCombiner that throws away the duplicate tuples
2. An optimizer that transforms certain uses of DISTINCT into an algebraic udf 
form



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to