Hi Binoy,

Thanks for the reporting on the issue and debugging ?

Did you try using individual commands or crawl script instead of the crawl
command  ?

You can try running Nutch remotely [1]. This will help you in running
commands from shell and debug using Eclipse.

[1]
http://wiki.apache.org/nutch/RunNutchInEclipse#Remote_Debugging_in_Eclipse


On Sun, Mar 31, 2013 at 11:25 PM, Binoy d <[email protected]> wrote:

> Hi,
>
> I have Nutch 2.x set up with Mysql and am seeing a peculiar null pointer
> exception with a crawl with sample seeds from DMOZ. I decided to do fresh
> crawl with only  one url as seed and empty webpage table.
> I am running *org.apache.nutch.crawl.Crawler* from eclipse  with args *urls
> -dir /home/binoy/lab/dmoz/apache-url -solr http://localhost:8983/solr/
> -depth 1  -topN 1*
>
> the apache-url seed file has only one entry ("http://nutch.apache.org/";)
>
>
> I see the following nullpointer exception : Logs :
> http://pastebin.com/CaqJpPkn
>
> With a little debugging from eclipse I see
>
>         conf.set(GeneratorJob.BATCH_ID, batchId);
>
> in IndexerJob.java createIndexJob method being the root cause.
>
> wrapping it in *if(batchId != null)  *seems to solve the issue.
>
> I wanted to know if this is  a valid patch. It seems from grep-ing no on
> else is reading GeneratorJob.BATCH_ID except indexerJob.
>
> I am always seeing batchId passed as null for createIndexJob for clean
> crawls (empty table), which scenario causes it to be not null? and what is
> the significance generator job batchId for indexing job.
>
> It seems a trivial issue and hence I didnot create a jira. I have attached
> the small patch and would be glad if some one can take a look.
>
> Regards,
> Binoy
>
>
>


-- 
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>

Reply via email to