Looking at DataFu's jira, it seems like there are quite a few UDF's which
are in various states of acceptance to the project. I tried to categorize
them, skipping those which seemed problematic for some reason, or didn't
have any patch request attached:

*No response from the project:*

New UDF for Histogram / Frequency counting

Edit distance <https://issues.apache.org/jira/browse/DATAFU-87>

Create IncrementalAvroStorage UDF for incrementally processing date
partitioned data <https://issues.apache.org/jira/browse/DATAFU-71>

simple hash for near duplicate detection

Is there any existing process for deciding how to accept new content? I
don't know if the submitters are still around, but we should probably try
to give some sort of response.

*In a process of review:*

Add datafu.text.ToJson UDF to serialize any relation/field as a JSON String
Add DataFu MR project (obviously not a UDF)

UDF's to handle map type <https://issues.apache.org/jira/browse/DATAFU-34>

NCDG <https://issues.apache.org/jira/browse/DATAFU-60>

Aho-Corasick <https://issues.apache.org/jira/browse/DATAFU-65>

New UDF - TupleDiff <https://issues.apache.org/jira/browse/DATAFU-119>

Some of these seem close to being finished. I think I'll take a look at the
first one.


Reply via email to