[
https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512910#comment-14512910
]
Chris A. Mattmann commented on NUTCH-1994:
------------------------------------------
OK, so here's some more info. I printed out the set of parsers returned from
the creation of the TikaConfig using the class's system class loader, along
with the default one in Tika. Both return {} as the list of parsers indicating
there is something screwy in SPI loading:
{noformat}
CREATE OUR OWN TIKA CONFIG default parser is
org.apache.tika.parser.DefaultParser
supported parsers {}
PARSER RETRIEVED! NULL!
2015-04-25 23:25:34,046 ERROR tika.TikaParser (TikaParser.java:getParse(87)) -
Can't retrieve Tika parser for mime-type text/plain
RESULT TEXT! textfile.txt
HERE IS THE PARSE TEXT textfile.txt
{noformat}
Furthermore, the upgrade needed more updates to plugin.xml, see the attached
patch. Didn't fix the issue, but is needed, regardless. I will keep digging.
> Upgrade to Apache Tika 1.8
> --------------------------
>
> Key: NUTCH-1994
> URL: https://issues.apache.org/jira/browse/NUTCH-1994
> Project: Nutch
> Issue Type: Improvement
> Components: build, parser
> Affects Versions: 1.10, 2.3.1
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 1.10, 2.3.1
>
> Attachments: NUTCH-1994-2.x.patch, NUTCH-1994-trunk.patch
>
>
> Tika 1.8 was released this morning.
> Lets upgrade then release Nutch trunk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)