[
https://issues.apache.org/jira/browse/NUTCH-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882581#action_12882581
]
Alex McLintock commented on NUTCH-478:
--------------------------------------
I'm quite keen on this idea. The lack of this feature made things really quite
difficult when testing
I have one small comment...
> User create a file named "FetchStop" in nutch home.
Presumably it should be in the top directory of the crawl - not in nutch home.
If it were in nutch home then you would switch off all crawls currently going
on - and there may be more than one.
> Add function for stopping FetherThread gracefully
> -------------------------------------------------
>
> Key: NUTCH-478
> URL: https://issues.apache.org/jira/browse/NUTCH-478
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher
> Affects Versions: 0.9.0
> Reporter: chee.wu
>
> Now the fetch process will be stopped only when time out occurred during
> the fetch:
> "System.currentTimeMillis() - lastRequestStart.get()) > timeout "
> We don't have method to let fetch process to stop.Some times we may have
> strict time requirement for fetch process, for example from 11pm to 7am.I
> want to shutdown fetch process at 7am every day even there still have pages
> remained unfeched in the segments generated.
> A possible solution to implement this might be:
> 1. User create a file named "FetchStop" in nutch home.
> 2. Check the existence of the file every minute in the main thread,and set
> the boolean variable like "stopFetch" to true;
> 3. FetchThread will check the status of "stopFetch" before fetching next
> URL. If changed to true, FetcherThread will stop right now,also the value of
> activeThreads will be reduced.
> 4. Finally, the main thread will end if activeThreads=0
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.