Hi Kaveh, I am running 2.x HEAD with NUTCH-1532-v3[0] & NUTCH-1551 On Fri, Apr 5, 2013 at 12:22 AM, <[email protected]> wrote:
> > Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh crawl > with One Seed > 23024 by: Lewis John Mcgibbney > 23031 by: kaveh minooie > > the fileds: baseUrl, protoclolStatus, reprUrl, batchId are null and the > outlinks is empty. I am still in the process of familiarizing myself with > code, so I can't say it for sure, and I apologize for asking stupid > questions while we are at it, but this doesn't seem right to me, am i right > to assume that the mentioned fields or at least most of them should have > values? > This should definately not happen. From a fresh crawl, I am running a normal inject, generate, fetch, parse, update with Nutch 2.X HEAD, gora-core 0.2.1 and gora-cassandra 0.2. I've up;oaded my webtable dump after each phase and the output can be seen for the stages here [0]. You can clearly see that the fields you state above are substantiated upon at the following stages batchId: Generate baseUrl: Fetch protocolStatus: Fetch reprUrl: NULL???? outlinks: Parse > > also, the example that I am showing here is not a one off, these fields > have the same value for all, emphasis on ALL, the a few thousands urls that > I have fetched and with which I am playing to test the code. > > For my own sanity, I would like to get to the bottom of this. However, at this stage, I cannot reproduce your situation. [0] https://issues.apache.org/jira/secure/attachment/12576855/NUTCH-1532-v3.patch [1] https://issues.apache.org/jira/secure/attachment/12576254/NUTCH-1551.patch [2] http://people.apache.org/~lewismc/nutch_test/

