Hi
so I am not sure if binoy is talking about this but here it is:
the original exception comes from
src/java/org/apache/nutch/indexer/IndexUtil.java line 66
public NutchDocument index(String key, WebPage page) {
NutchDocument doc = new NutchDocument();
doc.add("id", key);
doc.add("digest", StringUtil.toHexString(page.getSignature().array()));
==>> doc.add("batchId", page.getBatchId().toString());
page.getBatchId() returns null for every urls. my guess is that updatedb
removes the batchID from the rows in webpage since the generate and
fetch work fine with batchId but after the updatedb ( which by the way
does not accept batchId as one of its parameter which means that it is
going over the entire webpage table everytime you run it, but that is a
different issue) solrindex can't find the batchIds
thou I am not sure, I am going over the code right after I hit the send :)
On 04/02/2013 01:55 PM, Lewis John Mcgibbney wrote:
Hi Binoy,
On Tue, Apr 2, 2013 at 11:42 AM, <[email protected]
<mailto:[email protected]>> wrote:
Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh
crawl with One Seed.
22979 by: Binoy d
Hi Lewis,
I understand the head branch can be unstable some of the time. I was
trying to point out that I was not able to reproduce the issue with
HEAD for 2.x . I will try and create the jira after I am back from
office. I try to not the create jiras without conforming the issue,
they just tend to add noise. I haven't used the crawl scripts much
so it might take some time for me to get logs from there .
Anything you can do to help us better understand the source of the issue
is greatly appreciated Binoy. Thank you for your perseverance (and
others who are helping on these issues) it is of real value to the Nutch
community.
Best
Lewis
--
Kaveh Minooie