Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh crawl with One Seed.

Binoy d Sun, 31 Mar 2013 23:23:16 -0700

Hi,
I am able to reproduce the issue from within Eclipse(not using the
scripts)  with revision 1455209. Any revision later seems to break my
workspace and i am not able to successully run any crawl using the scripts
or the eclipse run as options.


It seems the  head revision   for 2.x branch  (1462079) is not stable, has
any one been able to figure out the issue ?

Regards,
Binoy


On Sun, Mar 31, 2013 at 10:34 PM, Binoy d <[email protected]> wrote:

> Hi Kiran,
>
> I was running the org.apache.nutch.crawl.Crawler class from within eclipse
> (Run as configuration option) with usual arguments arguments urls -dir
> /home/binoy/lab/dmoz/apache-url -solr http://localhost:8983/solr/  -depth
> 1  -topN 1
> Thanks for tip on remote debugging. It seems the latest 2.x revision is
> broken as i just did Update to Head and i am seeing a completely different
> exception. Let me revert the workspace and look at it again, though i was
> able to consistently reproduce the issue before i did svn update.
>
> Regards,
> Binoy
>
>
>
> On Sun, Mar 31, 2013 at 8:48 PM, kiran chitturi <[email protected]
> > wrote:
>
>> Hi Binoy,
>>
>> Thanks for the reporting on the issue and debugging ?
>>
>> Did you try using individual commands or crawl script instead of the
>> crawl command  ?
>>
>> You can try running Nutch remotely [1]. This will help you in running
>> commands from shell and debug using Eclipse.
>>
>> [1]
>> http://wiki.apache.org/nutch/RunNutchInEclipse#Remote_Debugging_in_Eclipse
>>
>>
>> On Sun, Mar 31, 2013 at 11:25 PM, Binoy d <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I have Nutch 2.x set up with Mysql and am seeing a peculiar null pointer
>>> exception with a crawl with sample seeds from DMOZ. I decided to do fresh
>>> crawl with only  one url as seed and empty webpage table.
>>> I am running *org.apache.nutch.crawl.Crawler* from eclipse  with args *urls
>>> -dir /home/binoy/lab/dmoz/apache-url -solr http://localhost:8983/solr/
>>> -depth 1  -topN 1*
>>>
>>> the apache-url seed file has only one entry ("http://nutch.apache.org/";)
>>>
>>>
>>> I see the following nullpointer exception : Logs :
>>> http://pastebin.com/CaqJpPkn
>>>
>>> With a little debugging from eclipse I see
>>>
>>>         conf.set(GeneratorJob.BATCH_ID, batchId);
>>>
>>> in IndexerJob.java createIndexJob method being the root cause.
>>>
>>> wrapping it in *if(batchId != null)  *seems to solve the issue.
>>>
>>> I wanted to know if this is  a valid patch. It seems from grep-ing no on
>>> else is reading GeneratorJob.BATCH_ID except indexerJob.
>>>
>>> I am always seeing batchId passed as null for createIndexJob for clean
>>> crawls (empty table), which scenario causes it to be not null? and what is
>>> the significance generator job batchId for indexing job.
>>>
>>> It seems a trivial issue and hence I didnot create a jira. I have
>>> attached the small patch and would be glad if some one can take a look.
>>>
>>> Regards,
>>> Binoy
>>>
>>>
>>>
>>
>>
>> --
>> Kiran Chitturi
>>
>> <http://www.linkedin.com/in/kiranchitturi>
>>
>>
>>
>

Re: Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh crawl with One Seed.

Reply via email to