Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh crawl with One Seed.

Binoy d Sun, 31 Mar 2013 20:26:03 -0700

Hi,

I have Nutch 2.x set up with Mysql and am seeing a peculiar null pointer
exception with a crawl with sample seeds from DMOZ. I decided to do fresh
crawl with only  one url as seed and empty webpage table.
I am running *org.apache.nutch.crawl.Crawler* from eclipse  with args *urls
-dir /home/binoy/lab/dmoz/apache-url -solr http://localhost:8983/solr/
-depth 1  -topN 1*


the apache-url seed file has only one entry ("http://nutch.apache.org/";)


I see the following nullpointer exception : Logs :
http://pastebin.com/CaqJpPkn

With a little debugging from eclipse I see

        conf.set(GeneratorJob.BATCH_ID, batchId);

in IndexerJob.java createIndexJob method being the root cause.

wrapping it in *if(batchId != null)  *seems to solve the issue.

I wanted to know if this is  a valid patch. It seems from grep-ing no on
else is reading GeneratorJob.BATCH_ID except indexerJob.

I am always seeing batchId passed as null for createIndexJob for clean
crawls (empty table), which scenario causes it to be not null? and what is
the significance generator job batchId for indexing job.

It seems a trivial issue and hence I didnot create a jira. I have attached
the small patch and would be glad if some one can take a look.

Regards,
Binoy

Index: IndexerJob.java
===================================================================
--- IndexerJob.java     (revision 1454771)
+++ IndexerJob.java     (working copy)
@@ -125,7 +125,11 @@
 
   protected Job createIndexJob(Configuration conf, String jobName, String 
batchId)
   throws IOException, ClassNotFoundException {
-    conf.set(GeneratorJob.BATCH_ID, batchId);
+         
+    if(batchId != null){
+       conf.set(GeneratorJob.BATCH_ID, batchId);
+    }
+    
     Job job = new NutchJob(conf, jobName);
     // TODO: Figure out why this needs to be here
     job.getConfiguration().setClass("mapred.output.key.comparator.class",

Nutch2.x Null Pointer Exception in IndexerJob.Java for a fresh crawl with One Seed.

Reply via email to