[
https://issues.apache.org/jira/browse/NUTCH-1893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1893:
-----------------------------------
Attachment: NUTCH-1893-v1.patch
Confirmed. A manual fix is, however, not ideal. Until the problem is fixed at
rome (see [issue at github|https://github.com/rometools/rome/issues/130]) we
could override the dependency in parse-tika's ivy.xml (and plugin.xml): patch
attached.
> Parse-tika plugin seems broken when parsing some feed file
> ----------------------------------------------------------
>
> Key: NUTCH-1893
> URL: https://issues.apache.org/jira/browse/NUTCH-1893
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.9
> Environment: Windows 7 + Cygwin + JDK 7
> Reporter: Mengying Wang
> Priority: Minor
> Attachments: NUTCH-1893-v1.patch, NUTCH-1893.mywang.141209.txt
>
>
> In the Nutch parse step, I received the following error. It seems the
> parse-tika plugin has broken.
> $ /cygdrive/d/nutch_trunk/runtime/local/bin/nutch parse -D
> mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
> mapred.reduce.tasks.speculative.execution=false -D
> mapred.map.tasks.speculative.execution=false -D
> mapred.compress.map.output=true -D mapred.skip.attempts.to.start.skipping=2
> -D mapred.skip.map.max.skip.records=1 crawlId/segments/20141118235323
> java.lang.ExceptionInInitializerError
> at com.sun.syndication.io.SyndFeedInput.build(SyndFeedInput.java:136)
> at org.apache.tika.parser.feed.FeedParser.parse(FeedParser.java:70)
> at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:103)
> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:95)
> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:101)
> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: java.lang.NullPointerException
> at java.util.Properties$LineReader.readLine(Properties.java:434)
> at java.util.Properties.load0(Properties.java:353)
> at java.util.Properties.load(Properties.java:341)
> at
> com.sun.syndication.io.impl.PropertiesLoader.<init>(PropertiesLoader.java:74)
> at
> com.sun.syndication.io.impl.PropertiesLoader.getPropertiesLoader(PropertiesLoader.java:46)
> at
> com.sun.syndication.io.impl.PluginManager.<init>(PluginManager.java:54)
> at
> com.sun.syndication.io.impl.PluginManager.<init>(PluginManager.java:46)
> at
> com.sun.syndication.feed.synd.impl.Converters.<init>(Converters.java:40)
> at
> com.sun.syndication.feed.synd.SyndFeedImpl.<clinit>(SyndFeedImpl.java:59)
> ... 10 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)