[
https://issues.apache.org/jira/browse/TIKA-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281129#comment-14281129
]
Konstantin Gribov commented on TIKA-1516:
-----------------------------------------
[~lewismc], if I understood you correct (English isn't my strong point), you
asked is this case general or specific for for some input. My point is that
this issue has general cause which can be reproduced only in non-trivial
classloading scheme. Little analysis below.
Rome 1.0 use {{Thread.currentThread().getContextClassLoader()}} for loading
both {{com/sun/syndication/rome.properties}} (default rome config) and
{{rome.properties}} (user config). If classloader, used to load rome, is
current thread's context classloader (CCL) or its ancestor classloader this
works fine.
Earlier rome (0.9) used same classloader as used when loading rome's
PluginManager class (which invokes PropertiesLoader). This method finds default
rome config in since they are loaded from same jar. But if user code is loaded
by other classloader (e. g. for security reason) in different classloading
branch it rome can't find user config. I don't know hadoop classloading scheme
(and nutch use hadoop) but such case can be simply reproduced in servlet
container if rome is loaded by ext/common classloader and app -- by webapp
classloader.
I think, this was a reason to use CCL, but it lead to new problem. If is set to
system and rome is loaded by descandant classloader ({{PluginClassLoader}} in
this case) rome can't load its default config and ends with NPE as above.
I'll try to create test for this case soon.
> Downgrade Rome dependency to 0.9 to avoid nasty NPE
> ---------------------------------------------------
>
> Key: TIKA-1516
> URL: https://issues.apache.org/jira/browse/TIKA-1516
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.6
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 1.8
>
> Attachments: TIKA-1516.patch
>
>
> As documented [in this
> thread|http://www.mail-archive.com/dev%40nutch.apache.org/msg15755.html]
> Nutch's
> [parse-tika|https://github.com/apache/nutch/blob/trunk/src/plugin/parse-tika/plugin.xml#L56]
> uses Rome 1.0, this is inherited directly from the Tika pom.xml for the
> [same
> depenency|https://github.com/apache/tika/blob/trunk/tika-parsers/pom.xml#L184].
> A downgrade is required.
> {code}
> java.lang.Exception: java.lang.ExceptionInInitializerError
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
> Caused by: java.lang.ExceptionInInitializerError
> at com.sun.syndication.io.SyndFeedInput.build(SyndFeedInput.java:136)
> at org.apache.tika.parser.feed.FeedParser.parse(FeedParser.java:70)
> at
> org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:105)
> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:95)
> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:101)
> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.NullPointerException
> at java.util.Properties$LineReader.readLine(Properties.java:418)
> at java.util.Properties.load0(Properties.java:337)
> at java.util.Properties.load(Properties.java:325)
> at
> com.sun.syndication.io.impl.PropertiesLoader.<init>(PropertiesLoader.java:74)
> at
> com.sun.syndication.io.impl.PropertiesLoader.getPropertiesLoader(PropertiesLoader.java:46)
> at
> com.sun.syndication.io.impl.PluginManager.<init>(PluginManager.java:54)
> at
> com.sun.syndication.io.impl.PluginManager.<init>(PluginManager.java:46)
> at
> com.sun.syndication.feed.synd.impl.Converters.<init>(Converters.java:40)
> at
> com.sun.syndication.feed.synd.SyndFeedImpl.<clinit>(SyndFeedImpl.java:59)
> ... 16 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)