[ https://issues.apache.org/jira/browse/DATAFU-67?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mohammad S Amin updated DATAFU-67: ---------------------------------- Attachment: DATAFU-67 > Adding Simple SimHash for near duplicate detection > -------------------------------------------------- > > Key: DATAFU-67 > URL: https://issues.apache.org/jira/browse/DATAFU-67 > Project: DataFu > Issue Type: New Feature > Reporter: Mohammad S Amin > Attachments: DATAFU-67 > > > Adding Simple SimHash for near duplicate detection. The UDF computes SimHash > for each document which can then be compared accross multiple documents. -- This message was sent by Atlassian JIRA (v6.2#6252)