Flag for generate to fetch only new pages to complement the -refetchonly flag
-----------------------------------------------------------------------------
Key: NUTCH-49
URL: http://issues.apache.org/jira/browse/NUTCH-49
Project: Nutch
Type: New Feature
Components: fetcher
Reporter: Luke Baker
Priority: Minor
Attachments: fetchnewonly.patch
It would be useful, especially for research/testing purposes, to have a flag
for the FetchListTool that make sure to only include URLs in the fetchlist that
have not already been fetched (according to the information from the webdb that
you're generating the fetchlist from).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira