[jira] [Commented] (DATAFU-88) Port Stanford Core NLP Functionality to DataFu

Matthew Hayes (JIRA) Wed, 11 Feb 2015 14:38:55 -0800

    [ 
https://issues.apache.org/jira/browse/DATAFU-88?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317125#comment-14317125
 ]


Matthew Hayes commented on DATAFU-88:
-------------------------------------

Thanks Jakob.  I think this feature can be treated as optional.

So suppose we added a compile time dependency like below to the project.  That 
means when you build it will automatically download the library, however it 
will not be packaged in the final datafu jar.  The UDF will be included in the 
final JAR but it won't work unless you download this dependency.  We can 
provide instructions on how to do that.  Does this seem okay?

{code}
diff --git a/datafu-pig/build.gradle b/datafu-pig/build.gradle
index ea385d2..56466ed 100644
--- a/datafu-pig/build.gradle
+++ b/datafu-pig/build.gradle
@@ -151,6 +151,9 @@ dependencies {
   autojarred "org.apache.opennlp:opennlp-tools:$openNlpVersion"
   autojarred "org.apache.opennlp:opennlp-uima:$openNlpVersion"
   autojarred "org.apache.opennlp:opennlp-maxent:$openNlpMaxEntVersion"
+  
+  // not autojarred because this is GPL
+  compile "edu.stanford.nlp:stanford-corenlp:$stanfordCoreNlpVersion"
 
   // needed to run jarjar
   jarjar "com.googlecode.jarjar:jarjar:1.3"
@@ -218,4 +221,4 @@ test {
   systemProperty 'datafu.data.dir', file('data')
 
   maxHeapSize = "2G"
-}
\ No newline at end of file
+}
diff --git a/gradle/dependency-versions.gradle 
b/gradle/dependency-versions.gradle
index 3b0835f..81012fc 100644
--- a/gradle/dependency-versions.gradle
+++ b/gradle/dependency-versions.gradle
@@ -39,4 +39,5 @@ ext {
   jsonVersion="20090211"
   jsr311Version="1.1.1"
   slf4jVersion="1.6.4"
+  stanfordCoreNlpVersion="3.5.0"
 }
{code}

> Port Stanford Core NLP Functionality to DataFu
> ----------------------------------------------
>
>                 Key: DATAFU-88
>                 URL: https://issues.apache.org/jira/browse/DATAFU-88
>             Project: DataFu
>          Issue Type: New Feature
>    Affects Versions: 1.3.0
>            Reporter: Russell Jurney
>            Assignee: Russell Jurney
>              Labels: lemmatizer, nlp, pig, pig_udf, stanford, stemmer
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For starters I need the Stanford Core NLP stemmer and lemmatizer. 
> It looks like maybe I can add something generic and feed arguments to code 
> like: props.put("annotators", "tokenize, ssplit, pos, lemma");
> Helpful example of lemmatizing at 
> http://stackoverflow.com/questions/1578062/lemmatization-java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DATAFU-88) Port Stanford Core NLP Functionality to DataFu

Reply via email to