Hi Binoy, Thanks for the reporting on the issue and debugging ?
Did you try using individual commands or crawl script instead of the crawl command ? You can try running Nutch remotely [1]. This will help you in running commands from shell and debug using Eclipse. [1] http://wiki.apache.org/nutch/RunNutchInEclipse#Remote_Debugging_in_Eclipse On Sun, Mar 31, 2013 at 11:25 PM, Binoy d <[email protected]> wrote: > Hi, > > I have Nutch 2.x set up with Mysql and am seeing a peculiar null pointer > exception with a crawl with sample seeds from DMOZ. I decided to do fresh > crawl with only one url as seed and empty webpage table. > I am running *org.apache.nutch.crawl.Crawler* from eclipse with args *urls > -dir /home/binoy/lab/dmoz/apache-url -solr http://localhost:8983/solr/ > -depth 1 -topN 1* > > the apache-url seed file has only one entry ("http://nutch.apache.org/") > > > I see the following nullpointer exception : Logs : > http://pastebin.com/CaqJpPkn > > With a little debugging from eclipse I see > > conf.set(GeneratorJob.BATCH_ID, batchId); > > in IndexerJob.java createIndexJob method being the root cause. > > wrapping it in *if(batchId != null) *seems to solve the issue. > > I wanted to know if this is a valid patch. It seems from grep-ing no on > else is reading GeneratorJob.BATCH_ID except indexerJob. > > I am always seeing batchId passed as null for createIndexJob for clean > crawls (empty table), which scenario causes it to be not null? and what is > the significance generator job batchId for indexing job. > > It seems a trivial issue and hence I didnot create a jira. I have attached > the small patch and would be glad if some one can take a look. > > Regards, > Binoy > > > -- Kiran Chitturi <http://www.linkedin.com/in/kiranchitturi>

