Looking at DataFu's jira, it seems like there are quite a few UDF's which are in various states of acceptance to the project. I tried to categorize them, skipping those which seemed problematic for some reason, or didn't have any patch request attached:
*No response from the project:* New UDF for Histogram / Frequency counting <https://issues.apache.org/jira/browse/DATAFU-98> Edit distance <https://issues.apache.org/jira/browse/DATAFU-87> Create IncrementalAvroStorage UDF for incrementally processing date partitioned data <https://issues.apache.org/jira/browse/DATAFU-71> simple hash for near duplicate detection <https://issues.apache.org/jira/browse/DATAFU-67> Is there any existing process for deciding how to accept new content? I don't know if the submitters are still around, but we should probably try to give some sort of response. *In a process of review:* Add datafu.text.ToJson UDF to serialize any relation/field as a JSON String <https://issues.apache.org/jira/browse/DATAFU-9> Add DataFu MR project (obviously not a UDF) <https://issues.apache.org/jira/browse/DATAFU-51> UDF's to handle map type <https://issues.apache.org/jira/browse/DATAFU-34> NCDG <https://issues.apache.org/jira/browse/DATAFU-60> Aho-Corasick <https://issues.apache.org/jira/browse/DATAFU-65> New UDF - TupleDiff <https://issues.apache.org/jira/browse/DATAFU-119> Some of these seem close to being finished. I think I'll take a look at the first one. Regards, Eyal