Hi

so I am not sure if binoy is talking about this but here it is:

the original exception comes from
src/java/org/apache/nutch/indexer/IndexUtil.java  line 66

 public NutchDocument index(String key, WebPage page) {
    NutchDocument doc = new NutchDocument();
    doc.add("id", key);
    doc.add("digest", StringUtil.toHexString(page.getSignature().array()));
==>>    doc.add("batchId", page.getBatchId().toString());

page.getBatchId() returns null for every urls. my guess is that updatedb removes the batchID from the rows in webpage since the generate and fetch work fine with batchId but after the updatedb ( which by the way does not accept batchId as one of its parameter which means that it is going over the entire webpage table everytime you run it, but that is a different issue) solrindex can't find the batchIds

thou I am not sure, I am going over the code right after I hit the send :)


On 04/02/2013 01:55 PM, Lewis John Mcgibbney wrote:
Hi Binoy,


On Tue, Apr 2, 2013 at 11:42 AM, <[email protected]
<mailto:[email protected]>> wrote:


    Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh
    crawl with One Seed.
             22979 by: Binoy d

    Hi Lewis,
    I understand the head branch can be unstable some of the time. I was
    trying to point out that I was not able to reproduce the issue with
    HEAD for 2.x . I will try and create the jira after I am back from
    office.  I try to not the create jiras without conforming the issue,
    they just tend to add noise. I haven't used the crawl scripts much
    so it might take some time for me to get logs from there .


Anything you can do to help us better understand the source of the issue
is greatly appreciated Binoy. Thank you for your perseverance (and
others who are helping on these issues) it is of real value to the Nutch
community.
Best
Lewis

--
Kaveh Minooie

Reply via email to