Johannes Schwenk created PIG-3042:
-------------------------------------
Summary: Implement new SPLIT_DISTINCT relational operator
Key: PIG-3042
URL: https://issues.apache.org/jira/browse/PIG-3042
Project: Pig
Issue Type: New Feature
Reporter: Johannes Schwenk
If DISTINCT would operate as a function we could do something like this
{code}
SPLIT data INTO
new_entries IF COUNT(DISTINCT(*)) > 1,
duplicate_entries OTHERWISE;
{code}
Since this is unfortunately not the case (see also PIG-826), I would like to
propose a new SPLIT_DISTINCT (name is up for discussion) operator that acts in
the way the above code intents. One would then just have to write:
{code}
SPLIT_DISTINCT data INTO new_entries, duplicate_entries;
{code}
Wanting to separate duplicates from the rest of e.g. log data, is a common
scenario I think and the new operator would make this task a lot simpler.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira