[
https://issues.apache.org/jira/browse/TIKA-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993220#comment-14993220
]
Chris A. Mattmann commented on TIKA-1787:
-----------------------------------------
Great work as a start, [~Yueheng]! The thing is directly binding to the
library isn't possible due to the NLTK license (GPL):
http://nlp.stanford.edu/software/CRF-NER.shtml#Download
However, we can include NLTK in the form that [~thammegowda] did in #61 on
Github - that is - he and I talked about a command line invocation of the tool
that we could host on Github and then have Tika call it at runtime which means
we wouldn't have to bind to the license.
Let me think about this. Thank you!
> Include Stanford Name Entity Recognition in Tika
> ------------------------------------------------
>
> Key: TIKA-1787
> URL: https://issues.apache.org/jira/browse/TIKA-1787
> Project: Tika
> Issue Type: Improvement
> Components: mime, parser
> Affects Versions: 1.12
> Environment: Java 1.8, Mac OSX 10.11
> Reporter: Yueheng He
> Assignee: Chris A. Mattmann
> Labels: features, newbie, test
> Fix For: 1.12
>
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> Using the Stanford Name Entity Recognition, Tika will be able to extract name
> entities like PERSON, ORGANIZATION, LOCATION, etc from the given text. The
> extracted name entities will be added to the metadata
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)