[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

Thamme Gowda N (JIRA) Wed, 11 Nov 2015 12:34:39 -0800

    [ 
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001042#comment-15001042
 ]


Thamme Gowda N commented on TIKA-1787:
--------------------------------------

With #61, The CoreNLP NER can be activated by following steps:

- Add CoreNLP jars and models to classpath. If you are using maven, then add :
{code}
   <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>${corenlp.version}</version>
        </dependency>

       <!-- This is a HUGE FILE -->
       <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>${corenlp.version}</version>
            <classifier>models</classifier>
        </dependency>
{code}

- Set System property "ner.impl.class" to 
"org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser"
   You can do it either by calling `System.setProperty()` before instantiating 
tika parsers in code or via commandline by using 
"-Dner.impl.class=org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser" while 
launching the JVM.

- Activate the NamedEntityParser

A demo project setup is at : https://github.com/thammegowda/tika-ner-corenlp





> Include Stanford Name Entity Recognition in Tika
> ------------------------------------------------
>
>                 Key: TIKA-1787
>                 URL: https://issues.apache.org/jira/browse/TIKA-1787
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime, parser
>    Affects Versions: 1.12
>         Environment: Java 1.8, Mac OSX 10.11
>            Reporter: Yueheng He
>            Assignee: Chris A. Mattmann
>              Labels: features, newbie, test
>             Fix For: 1.12
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name 
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The 
> extracted name entities will be added to the metadata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1787) Include Stanford Name Entity Recognition in Tika

Reply via email to