[ 
https://issues.apache.org/jira/browse/DATAFU-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883621#comment-13883621
 ] 

Matthew Hayes commented on DATAFU-14:
-------------------------------------

Can you describe more the functionality from lucene you'd like to add?  I don't 
know very much about lucene so it's hard for me to see how all the future UDFs 
can potentially fit together.  If we are going to wrap lucene functionality 
with UDFs we should make sure we have a cohesive set that work together.  For 
example, the n-gram tokenizer UDF produces a bag of n-grams from the input 
string.  But then what do you do with this data?  Will the UDFs written in the 
future that wrap other lucene functionality be able to consume it?  I think if 
we are going to commit any lucene UDFs we should start with a MVP, rather than 
commit these individually.

> Add NGram Tokenizer to datafu.pig.text.lucene
> ---------------------------------------------
>
>                 Key: DATAFU-14
>                 URL: https://issues.apache.org/jira/browse/DATAFU-14
>             Project: DataFu
>          Issue Type: Improvement
>         Environment: plants
>            Reporter: Russell Jurney
>
> See 
> https://github.com/rjurney/datafu/blob/lucene/src/java/datafu/pig/text/lucene/NGramTokenize.java
> Held up by 
> http://stackoverflow.com/questions/21064520/how-to-use-lucene-shinglefilter-could-not-find-implementing-class-for-org-apach/21067142?noredirect=1#21067142



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to