Shubham Gupta created NUTCH-2339:
------------------------------------
Summary: Nutch does not fetch documents with the -all argument
Key: NUTCH-2339
URL: https://issues.apache.org/jira/browse/NUTCH-2339
Project: Nutch
Issue Type: Bug
Components: nutchNewbie
Affects Versions: 2.3.1
Environment: Nutch 2.3.1 + Hadoop 2.7.1
Reporter: Shubham Gupta
Fix For: 2.4
I have deployed Nutch on the hadoop server. And whenever I check the count I
get a humongous amount of docs with the status whereas very little amount of
documents as compared to it with status 2.
The statistics are as follows:
{ "status" : null, "count" : 16 }
{ "status" : 1, "count" : 358437 }
{ "status" : 2, "count" : 92021 }
{ "status" : 3, "count" : 7354 }
{ "status" : 4, "count" : 2807 }
{ "status" : 5, "count" : 4042 }
{ "status" : 34, "count" : 2767 }
{ "status" : 38, "count" : 229 }
For successful fetching of status 1 documents, I have to run the command
separately,then it starts fetching the status 1 documents. Is there any fix for
this problem?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)