[
https://issues.apache.org/jira/browse/DATAFU-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880601#comment-13880601
]
Matthew Hayes commented on DATAFU-14:
-------------------------------------
I cloned your repo and took a look. It could be that autojar is getting
confused. It only tries to include the classes that are absolutely necessary.
If lucene is referencing classes dynamically then autojar may not discover the
dependency and remove them. Autojar tries to handle cases like this
supposedly, but maybe it isn't working here. The other options are:
1) Packaging all required lucene JARs in datafu (under a different namespace of
course) -- requires changing the build.xml
2) Requiring that the lucene JARs be present in the classpath if you want to
use the wrapper functions -- only requires moving your conf for the lucene jars
from "packaged" to "common" in ivy.xml
We may want to start a separate discussion on JAR packaging actually and come
up with some guidelines or policy. I started doing this when we added the
fastutil dependency, as fastutil is a very large JAR that you don't want to
include in its entirety. Autojar is nice because it strips out what you don't
need. It's also nice to not have to worry about other JARs. Just get the
datafu JAR, register it and you're set. But maybe there are cases where the
user should get the necessary JARs (rather than having them packaged),
especially in cases where the UDF is a somewhat simple wrapper around
functionality from another JAR. Or, we could ship a separate artifact with the
UDFs plus the necessary dependencies (e.g. lucene) in a single JAR, like
datafu-pig-lucene-x.y.z.jar. Or, as another example,
datafu-pig-opennlp-x.y.z.jar. I'm not sure what the right approach is, I'll
have to think on it some more.
> Add NGram Tokenizer to datafu.pig.text.lucene
> ---------------------------------------------
>
> Key: DATAFU-14
> URL: https://issues.apache.org/jira/browse/DATAFU-14
> Project: DataFu
> Issue Type: Improvement
> Environment: plants
> Reporter: Russell Jurney
>
> See
> https://github.com/rjurney/datafu/blob/lucene/src/java/datafu/pig/text/lucene/NGramTokenize.java
> Held up by
> http://stackoverflow.com/questions/21064520/how-to-use-lucene-shinglefilter-could-not-find-implementing-class-for-org-apach/21067142?noredirect=1#21067142
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)