[ https://issues.apache.org/jira/browse/DATAFU-88?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317125#comment-14317125 ]
Matthew Hayes commented on DATAFU-88: ------------------------------------- Thanks Jakob. I think this feature can be treated as optional. So suppose we added a compile time dependency like below to the project. That means when you build it will automatically download the library, however it will not be packaged in the final datafu jar. The UDF will be included in the final JAR but it won't work unless you download this dependency. We can provide instructions on how to do that. Does this seem okay? {code} diff --git a/datafu-pig/build.gradle b/datafu-pig/build.gradle index ea385d2..56466ed 100644 --- a/datafu-pig/build.gradle +++ b/datafu-pig/build.gradle @@ -151,6 +151,9 @@ dependencies { autojarred "org.apache.opennlp:opennlp-tools:$openNlpVersion" autojarred "org.apache.opennlp:opennlp-uima:$openNlpVersion" autojarred "org.apache.opennlp:opennlp-maxent:$openNlpMaxEntVersion" + + // not autojarred because this is GPL + compile "edu.stanford.nlp:stanford-corenlp:$stanfordCoreNlpVersion" // needed to run jarjar jarjar "com.googlecode.jarjar:jarjar:1.3" @@ -218,4 +221,4 @@ test { systemProperty 'datafu.data.dir', file('data') maxHeapSize = "2G" -} \ No newline at end of file +} diff --git a/gradle/dependency-versions.gradle b/gradle/dependency-versions.gradle index 3b0835f..81012fc 100644 --- a/gradle/dependency-versions.gradle +++ b/gradle/dependency-versions.gradle @@ -39,4 +39,5 @@ ext { jsonVersion="20090211" jsr311Version="1.1.1" slf4jVersion="1.6.4" + stanfordCoreNlpVersion="3.5.0" } {code} > Port Stanford Core NLP Functionality to DataFu > ---------------------------------------------- > > Key: DATAFU-88 > URL: https://issues.apache.org/jira/browse/DATAFU-88 > Project: DataFu > Issue Type: New Feature > Affects Versions: 1.3.0 > Reporter: Russell Jurney > Assignee: Russell Jurney > Labels: lemmatizer, nlp, pig, pig_udf, stanford, stemmer > Original Estimate: 168h > Remaining Estimate: 168h > > For starters I need the Stanford Core NLP stemmer and lemmatizer. > It looks like maybe I can add something generic and feed arguments to code > like: props.put("annotators", "tokenize, ssplit, pos, lemma"); > Helpful example of lemmatizing at > http://stackoverflow.com/questions/1578062/lemmatization-java -- This message was sent by Atlassian JIRA (v6.3.4#6332)