Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.
The "TikaAndNER" page has been changed by ThammeGowda: https://wiki.apache.org/tika/TikaAndNER?action=diff&rev1=3&rev2=4 Comment: Fix typos </parsers> </properties> }}} - Depending on your environment, this configuration has to supplied in the later phases. + This configuration has to be supplied in the later phases, so store it as 'tika-config.xml'. + Note: The NamedEntityParser parser does not restrict mimetypes, it uses Tika's auto detect parser to read text content from non-text streams. @@ -34, +35 @@ || LOCATION || org/apache/tika/parser/ner/opennlp/ner-location.bin || http://opennlp.sourceforge.net/models-1.5/en-ner-location.bin|| || ORAGANIZATION || org/apache/tika/parser/ner/opennlp/ner-organization.bin || http://opennlp.sourceforge.net/models-1.5/en-ner-organization.bin|| || DATE || org/apache/tika/parser/ner/opennlp/ner-date.bin || http://opennlp.sourceforge.net/models-1.5/en-ner-date.bin|| - || MONEY || org/apache/tika/parser/ner/opennlp/ner-money.bin || http://opennlp.sourceforge.net/models-1.5/en-ner-money.bin|| + || TIME || org/apache/tika/parser/ner/opennlp/ner-time.bin || http://opennlp.sourceforge.net/models-1.5/en-ner-time.bin|| || PERCENT || org/apache/tika/parser/ner/opennlp/ner-percentage.bin || http://opennlp.sourceforge.net/models-1.5/en-ner-percentage.bin || || MONEY || org/apache/tika/parser/ner/opennlp/ner-money.bin || http://opennlp.sourceforge.net/models-1.5/en-ner-money.bin || @@ -106, +107 @@ }}} - The CoreNLP CRF classifier recognised the following with in text content of http://www.hawking.org.uk page: + The CoreNLP CRF classifier recognised the following from the text content of http://www.hawking.org.uk page: {{{ NER_DATE: 2009 NER_DATE: 1963 @@ -151, +152 @@ ==== Using Regular Expressions ==== - The '''org.apache.tika.parser.ner.regex.RegexNERecogniser''' implementation based on Regular expressions. The following steps are required to use this implementation: + The '''org.apache.tika.parser.ner.regex.RegexNERecogniser''' provides an implementation based on Regular expressions. The following steps are required to use this implementation: * Configure regular expressions in 'org/apache/tika/parser/ner/regex/ner-regex.txt' - * Set ``ner.impl.class`` to Regex implementation + * Set System property ``ner.impl.class`` to ''org.apache.tika.parser.ner.regex.RegexNERecogniser'' ===== Tika + RegexNER in action ===== {{{ @@ -186, +187 @@ ==== Chaining all the above at once ==== Multiple class names can be provided by setting the system property ''ner.impl.class'' to a comma separtes class names - Example : ''-Dner.impl.class=org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser,org.apache.tika.parser.ner.regex.RegexNERecogniser'' + Example : '' -Dner.impl.class = org.apache.tika.parser.ner.opennlp.OpenNLPNERecogniser,org.apache.tika.parser.ner.regex.RegexNERecogniser''
