Looking at DataFu's jira, it seems like there are quite a few UDF's which
are in various states of acceptance to the project. I tried to categorize
them, skipping those which seemed problematic for some reason, or didn't
have any patch request attached:


*No response from the project:*

New UDF for Histogram / Frequency counting
<https://issues.apache.org/jira/browse/DATAFU-98>

Edit distance <https://issues.apache.org/jira/browse/DATAFU-87>

Create IncrementalAvroStorage UDF for incrementally processing date
partitioned data <https://issues.apache.org/jira/browse/DATAFU-71>

simple hash for near duplicate detection
<https://issues.apache.org/jira/browse/DATAFU-67>


Is there any existing process for deciding how to accept new content? I
don't know if the submitters are still around, but we should probably try
to give some sort of response.


*In a process of review:*

Add datafu.text.ToJson UDF to serialize any relation/field as a JSON String
<https://issues.apache.org/jira/browse/DATAFU-9>
Add DataFu MR project (obviously not a UDF)
<https://issues.apache.org/jira/browse/DATAFU-51>

UDF's to handle map type <https://issues.apache.org/jira/browse/DATAFU-34>

NCDG <https://issues.apache.org/jira/browse/DATAFU-60>

Aho-Corasick <https://issues.apache.org/jira/browse/DATAFU-65>

New UDF - TupleDiff <https://issues.apache.org/jira/browse/DATAFU-119>

Some of these seem close to being finished. I think I'll take a look at the
first one.

Regards,
Eyal

Reply via email to