[
https://issues.apache.org/jira/browse/NUTCH-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699508#comment-13699508
]
Markus Jelsma commented on NUTCH-1596:
--------------------------------------
Of course! I was already a bit suspicious about that since our last discussion
on Jira about class member vars not being thread safe! I'll test your patch
tomorrow! It should fix the issue because other issues we had also relies on
class member vars!
Cheers
> NodeWalker NPE on next node
> ---------------------------
>
> Key: NUTCH-1596
> URL: https://issues.apache.org/jira/browse/NUTCH-1596
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.7
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Fix For: 1.8
>
> Attachments: NUTCH-1596-v1.patch
>
>
> The NodeWalker used by the HeadingsParseFilter sometimes reports a
> NullPointerException.
> {code}
> 2013-07-02 11:02:09,428 WARN parse.ParseUtil - Error parsing .... with
> org.apache.nutch.parse.tika.TikaParser@2c8b586a
> java.util.concurrent.ExecutionException: java.lang.NullPointerException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262)
> at java.util.concurrent.FutureTask.get(FutureTask.java:119)
> at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:162)
> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:963)
> at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722)
> Caused by: java.lang.NullPointerException
> at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
> at org.apache.xerces.dom.ParentNode.item(Unknown Source)
> at org.apache.nutch.util.NodeWalker.nextNode(NodeWalker.java:75)
> at
> org.apache.nutch.parse.headings.HeadingsParseFilter.getElement(HeadingsParseFilter.java:84)
> at
> org.apache.nutch.parse.headings.HeadingsParseFilter.filter(HeadingsParseFilter.java:47)
> at
> org.apache.nutch.parse.HtmlParseFilters.filter(HtmlParseFilters.java:98)
> at
> org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:210)
> at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:35)
> at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:24)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> {code}
> This is strange because it only rarely fails and the nextNode() method checks
> hasNext() and there is no concurrent access if i'm correct.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira