Flag for generate to fetch only new pages to complement the -refetchonly flag
-----------------------------------------------------------------------------

         Key: NUTCH-49
         URL: http://issues.apache.org/jira/browse/NUTCH-49
     Project: Nutch
        Type: New Feature
  Components: fetcher  
    Reporter: Luke Baker
    Priority: Minor
 Attachments: fetchnewonly.patch

It would be useful, especially for research/testing purposes, to have a flag 
for the FetchListTool that make sure to only include URLs in the fetchlist that 
have not already been fetched (according to the information from the webdb that 
you're generating the fetchlist from).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to