Lewis John McGibbney created NUTCH-2010:
-------------------------------------------

             Summary: Implement isFetchingInProgress Utility Function in Fetcher
                 Key: NUTCH-2010
                 URL: https://issues.apache.org/jira/browse/NUTCH-2010
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 1.10
            Reporter: Lewis John McGibbney
             Fix For: 1.11


The aim here is to stop (without killing) a Nutch crawl if the data being 
fetched is not of value to the user. The user can infer this by implementing 
some visualization on top of the backported REST API for Nutch trunk (could 
probably also do with with 2.X REST API as well tbh).
I suggest that we implement a convenience utility function in potentially 
Fetcher.java which would looking something like the following
{code}
public static boolean isFetchingInProgress() {
  return fetchingInProgress;
}
{code}
The fetchingInProgress should be set to tru whenever fetcher threads are 
working and should be set to false whenever all fetcher threads are unoccupied 
and back in the pool vacant.
This would be a powerful mechanism for determining if a crawl could be stopped 
without corrupting data as currently happens when a fetcher task is 
interrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to