Hi Kaveh,
I am running 2.x HEAD with NUTCH-1532-v3[0] & NUTCH-1551

On Fri, Apr 5, 2013 at 12:22 AM, <[email protected]> wrote:

>
> Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh crawl
> with One Seed
>         23024 by: Lewis John Mcgibbney
>         23031 by: kaveh minooie
>


> the fileds: baseUrl, protoclolStatus, reprUrl, batchId are null and the
> outlinks is empty. I am still in the process of familiarizing myself with
> code, so I can't say it for sure, and I apologize for asking stupid
> questions while we are at it, but this doesn't seem right to me, am i right
> to assume that the mentioned fields or at least most of them should have
> values?
>

This should definately not happen. From a fresh crawl, I am running a
normal inject, generate, fetch, parse, update with Nutch 2.X HEAD,
gora-core 0.2.1 and gora-cassandra 0.2. I've up;oaded my webtable dump
after each phase and the output can be seen for the stages here [0].

You can clearly see that the fields you state above are substantiated upon
at the following stages
batchId: Generate
baseUrl: Fetch
protocolStatus: Fetch
reprUrl: NULL????
outlinks: Parse


>
> also, the example that I am showing here is not a one off, these fields
> have the same value for all, emphasis on ALL, the a few thousands urls that
> I have fetched and with which I am playing to test the code.
>
> For my own sanity, I would like to get to the bottom of this. However, at
this stage, I cannot reproduce your situation.

[0]
https://issues.apache.org/jira/secure/attachment/12576855/NUTCH-1532-v3.patch
[1]
https://issues.apache.org/jira/secure/attachment/12576254/NUTCH-1551.patch
[2] http://people.apache.org/~lewismc/nutch_test/

Reply via email to