Re: dev Digest 2 Apr 2013 18:42:33 -0000 Issue 1587

kaveh minooie Tue, 02 Apr 2013 15:30:36 -0700

Hi

so I am not sure if binoy is talking about this but here it is:


the original exception comes from
src/java/org/apache/nutch/indexer/IndexUtil.java  line 66

 public NutchDocument index(String key, WebPage page) {
    NutchDocument doc = new NutchDocument();
    doc.add("id", key);
    doc.add("digest", StringUtil.toHexString(page.getSignature().array()));
==>>    doc.add("batchId", page.getBatchId().toString());

page.getBatchId() returns null for every urls. my guess is that updatedbremoves the batchID from the rows in webpage since the generate andfetch work fine with batchId but after the updatedb ( which by the waydoes not accept batchId as one of its parameter which means that it isgoing over the entire webpage table everytime you run it, but that is adifferent issue) solrindex can't find the batchIds


thou I am not sure, I am going over the code right after I hit the send :)


On 04/02/2013 01:55 PM, Lewis John Mcgibbney wrote:

Hi Binoy,


On Tue, Apr 2, 2013 at 11:42 AM, <[email protected]
<mailto:[email protected]>> wrote:


    Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh
    crawl with One Seed.
             22979 by: Binoy d

    Hi Lewis,
    I understand the head branch can be unstable some of the time. I was
    trying to point out that I was not able to reproduce the issue with
    HEAD for 2.x . I will try and create the jira after I am back from
    office.  I try to not the create jiras without conforming the issue,
    they just tend to add noise. I haven't used the crawl scripts much
    so it might take some time for me to get logs from there .


Anything you can do to help us better understand the source of the issue
is greatly appreciated Binoy. Thank you for your perseverance (and
others who are helping on these issues) it is of real value to the Nutch
community.
Best
Lewis


--
Kaveh Minooie

Re: dev Digest 2 Apr 2013 18:42:33 -0000 Issue 1587

Reply via email to