Alex Bain created PIG-3562:
------------------------------
Summary: Implement combiner optimizations for DISTINCT
Key: PIG-3562
URL: https://issues.apache.org/jira/browse/PIG-3562
Project: Pig
Issue Type: Sub-task
Components: tez
Affects Versions: tez-branch
Reporter: Alex Bain
Assignee: Alex Bain
Currently, DISTINCT is implemented in a straightforward manner per
https://issues.apache.org/jira/browse/PIG-3538.
However, we can implement two types of combiner optimizations for DISTINCT,
just as the MRCompiler does for map-reduce:
1. A simple DistinctCombiner that throws away the duplicate tuples
2. An optimizer that transforms certain uses of DISTINCT into an algebraic udf
form
--
This message was sent by Atlassian JIRA
(v6.1#6144)