Hi Kiran, I was running the org.apache.nutch.crawl.Crawler class from within eclipse (Run as configuration option) with usual arguments arguments urls -dir /home/binoy/lab/dmoz/apache-url -solr http://localhost:8983/solr/ -depth 1 -topN 1 Thanks for tip on remote debugging. It seems the latest 2.x revision is broken as i just did Update to Head and i am seeing a completely different exception. Let me revert the workspace and look at it again, though i was able to consistently reproduce the issue before i did svn update.
Regards, Binoy On Sun, Mar 31, 2013 at 8:48 PM, kiran chitturi <[email protected]>wrote: > Hi Binoy, > > Thanks for the reporting on the issue and debugging ? > > Did you try using individual commands or crawl script instead of the crawl > command ? > > You can try running Nutch remotely [1]. This will help you in running > commands from shell and debug using Eclipse. > > [1] > http://wiki.apache.org/nutch/RunNutchInEclipse#Remote_Debugging_in_Eclipse > > > On Sun, Mar 31, 2013 at 11:25 PM, Binoy d <[email protected]> wrote: > >> Hi, >> >> I have Nutch 2.x set up with Mysql and am seeing a peculiar null pointer >> exception with a crawl with sample seeds from DMOZ. I decided to do fresh >> crawl with only one url as seed and empty webpage table. >> I am running *org.apache.nutch.crawl.Crawler* from eclipse with args *urls >> -dir /home/binoy/lab/dmoz/apache-url -solr http://localhost:8983/solr/ >> -depth 1 -topN 1* >> >> the apache-url seed file has only one entry ("http://nutch.apache.org/") >> >> >> I see the following nullpointer exception : Logs : >> http://pastebin.com/CaqJpPkn >> >> With a little debugging from eclipse I see >> >> conf.set(GeneratorJob.BATCH_ID, batchId); >> >> in IndexerJob.java createIndexJob method being the root cause. >> >> wrapping it in *if(batchId != null) *seems to solve the issue. >> >> I wanted to know if this is a valid patch. It seems from grep-ing no on >> else is reading GeneratorJob.BATCH_ID except indexerJob. >> >> I am always seeing batchId passed as null for createIndexJob for clean >> crawls (empty table), which scenario causes it to be not null? and what is >> the significance generator job batchId for indexing job. >> >> It seems a trivial issue and hence I didnot create a jira. I have >> attached the small patch and would be glad if some one can take a look. >> >> Regards, >> Binoy >> >> >> > > > -- > Kiran Chitturi > > <http://www.linkedin.com/in/kiranchitturi> > > >

