Hi, This is an issue. Below is the code of SolrDeleteDuplicate class from nutch 1.7 trunk where the solr record is deleted by id field. As documents don't have the url field therefore the id of the documents empty, so its throwing a null pointer exception when it runs.
Now i am writing on my phone. i diş not find this issue. But if you update from 1.7 to newer version. You will not get this error. Talat On Sep 2, 2014 10:22 AM, <vinay.kash...@socialinfra.net> wrote: > > > > Hi, > I have taken nutch 1.7 source and copied > mapred-site.xml,hdfs-site.xml,yarn-site.xml,hadoop-env.sh,core-site.xml > from my Hadoop 2.3.0-cdh5.1.0 and did an ant build. > Then went on to > runtime/deploy/bin to start the crawling. it successfully submitted > the jobs to my yarn. But later during indexing to solr, i'm getting below > exceptions. > I have copied the scheme-solr4.xml to my solr and added > exceptions in regex-urlfilter.txt for a particular website which i give > for crawling in the directory urls/seed.txt. > Error: > java.lang.NullPointerException > > at > org.apache.hadoop.io.Text.encode(Text.java:443) > > at > org.apache.hadoop.io.Text.set(Text.java:198) > > at > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:270) > > at > > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:241) > > at > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:198) > > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:184) > > at > org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52) > > at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) > > at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > > at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > > at > java.security.AccessController.doPrivileged(Native Method) > > at > javax.security.auth.Subject.doAs(Subject.java:415) > > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) > > at > org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > > > > Kindly, can any one tell me how to solve this issue? I'm basically > stuck > here!! > >