[
https://issues.apache.org/jira/browse/DATAFU-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883621#comment-13883621
]
Matthew Hayes commented on DATAFU-14:
-------------------------------------
Can you describe more the functionality from lucene you'd like to add? I don't
know very much about lucene so it's hard for me to see how all the future UDFs
can potentially fit together. If we are going to wrap lucene functionality
with UDFs we should make sure we have a cohesive set that work together. For
example, the n-gram tokenizer UDF produces a bag of n-grams from the input
string. But then what do you do with this data? Will the UDFs written in the
future that wrap other lucene functionality be able to consume it? I think if
we are going to commit any lucene UDFs we should start with a MVP, rather than
commit these individually.
> Add NGram Tokenizer to datafu.pig.text.lucene
> ---------------------------------------------
>
> Key: DATAFU-14
> URL: https://issues.apache.org/jira/browse/DATAFU-14
> Project: DataFu
> Issue Type: Improvement
> Environment: plants
> Reporter: Russell Jurney
>
> See
> https://github.com/rjurney/datafu/blob/lucene/src/java/datafu/pig/text/lucene/NGramTokenize.java
> Held up by
> http://stackoverflow.com/questions/21064520/how-to-use-lucene-shinglefilter-could-not-find-implementing-class-for-org-apach/21067142?noredirect=1#21067142
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)