eyala opened a new pull request, #46: URL: https://github.com/apache/datafu/pull/46
A new method for when you want to de-duplicate records, but not lose any "real" data. For example if a server creates events with an autogenerated event id, and sometimes events are duplicated. You don't want double rows just for the event ids, but if any of the other fields are distinct you want to keep the rows (with their original event ids) - otherwise you'd just drop the event id column. In order to keep at least one value you need to tediously list all the other columns. JIRA: https://issues.apache.org/jira/browse/DATAFU-177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@datafu.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org