[
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001042#comment-15001042
]
Thamme Gowda N commented on TIKA-1787:
--------------------------------------
With #61, The CoreNLP NER can be activated by following steps:
- Add CoreNLP jars and models to classpath. If you are using maven, then add :
{code}
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>${corenlp.version}</version>
</dependency>
<!-- This is a HUGE FILE -->
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>${corenlp.version}</version>
<classifier>models</classifier>
</dependency>
{code}
- Set System property "ner.impl.class" to
"org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser"
You can do it either by calling `System.setProperty()` before instantiating
tika parsers in code or via commandline by using
"-Dner.impl.class=org.apache.tika.parser.ner.corenlp.CoreNLPNERecogniser" while
launching the JVM.
- Activate the NamedEntityParser
A demo project setup is at : https://github.com/thammegowda/tika-ner-corenlp
> Include Stanford Name Entity Recognition in Tika
> ------------------------------------------------
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
> Issue Type: Improvement
> Components: mime, parser
> Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
> Reporter: Yueheng He
> Assignee: Chris A. Mattmann
> Labels: features, newbie, test
> Fix For: 1.12
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The
> extracted name entities will be added to the metadata
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)