sreemanth pulagam created NUTCH-1774:
----------------------------------------

             Summary: Crawling from REST API giving NullPointerException
                 Key: NUTCH-1774
                 URL: https://issues.apache.org/jira/browse/NUTCH-1774
             Project: Nutch
          Issue Type: Bug
          Components: REST_api
    Affects Versions: 2.2.1
            Reporter: sreemanth pulagam


Crawling is not working from REST API.

Steps to reproduce.
-----------------------
1. Start the Nutch server (port 9000).
2. Submit the PUT request , to create/initiate crawl job.
   eg: 
           URL: http://localhost:9000/nutch/jobs  
           HTTP METHOD: PUT
           Content: 
                {
                   "crawl":"123",
                   "type":"crawl",
                   "conf":"default",
                   "args":{
                      "class":"org.apache.nutch.crawl.Crawler",
                      "seed":"http://www.somesite.com";,
                      "seedDir":"runtime/local/url/url.txt",
                      "depth":2
                   }
                }
3. Getting the following exception in Generator phase. 
2014-05-13 11:37:57,863 WARN  mapred.LocalJobRunner 
(LocalJobRunner.java:run(435)) - job_local1326997137_0002
java.lang.NullPointerException
        at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
        at 
org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)







--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to