[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993481#comment-13993481
]
Julien Nioche commented on NUTCH-1714:
--------------------------------------
We are getting
{code}
java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232)
at java.util.concurrent.FutureTask.get(FutureTask.java:91)
at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:147)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:128)
at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:142)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:199)
Caused by: java.lang.NullPointerException
at
org.apache.nutch.parse.ParseStatusUtils.getEmptyParse(ParseStatusUtils.java:91)
at org.apache.nutch.parse.tika.TikaParser.getParse(TikaParser.java:92)
at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:36)
at org.apache.nutch.parse.ParseCallable.call(ParseCallable.java:23)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
{code}
when the parser fails. This is due to status.getArgs() returning null.
./nutch parsechecker -D parse.timeout=-1
"http://api.addthis.com/oexchange/0.8/forward/delicious/offer?username=addthiseere&url=www1.eere.energy.gov/buildings/commercial/news_detail.html%253Fnews_id=18485&title=Building%2520Technologies%2520Program:%2520News"
will illustrate the issue.
We should not get this NPE when a parser fails.
> Nutch 2.x upgrade to Gora 0.4
> -----------------------------
>
> Key: NUTCH-1714
> URL: https://issues.apache.org/jira/browse/NUTCH-1714
> Project: Nutch
> Issue Type: Improvement
> Reporter: Alparslan Avcı
> Assignee: Alparslan Avcı
> Fix For: 2.3
>
> Attachments: NUTCH-1714.patch, NUTCH-1714_NUTCH-1714_v2_v3.patch,
> NUTCH-1714v2.patch, NUTCH-1714v4.patch, NUTCH-1714v5.patch
>
>
> Nutch upgrade for GORA_94 branch has to be implemented. We can discuss the
> details in this issue.
--
This message was sent by Atlassian JIRA
(v6.2#6252)