[
https://issues.apache.org/jira/browse/NUTCH-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451689#comment-17451689
]
Sebastian Nagel commented on NUTCH-2910:
----------------------------------------
Hi [~lewismc], this is not a problem:
- Fetcher reads {{fetcher.timelimit.mins}}
- calculates the point in time (system millis) when to finish the fetching
(now + minutes * 60k)
- and puts it into {{fetcher.timelimit}}
- which is read by the mapper tasks (class Fetcher.FetcherRun) and the
FetchItemQueues
- if the timelimit is reached the queue feeder stops and all queues are emptied
I can confirm that the timelimit works.
- maybe it's a good idea to add a code comment that this property is
"volatile" or "temporary" and "set programmatically"
- shall we add all those properties to
[nutch-default.xml|https://nutch.apache.org/documentation/javadoc/apidocs/resources/nutch-default.xml]
? Could use the new "tags" element (HADOOP-15005) to mark them in a unique way.
See also
[https://cwiki.apache.org/confluence/display/NUTCH/NutchPropertiesCompleteList]
- but the list in the wiki is notoriously outdated.
> FetchItemQueues overloaded constructor also interprets fetcher timeout as -1
> e.g. no-timeout.
> ---------------------------------------------------------------------------------------------
>
> Key: NUTCH-2910
> URL: https://issues.apache.org/jira/browse/NUTCH-2910
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.18
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Major
> Fix For: 1.19
>
>
> The FetchItemQueues overloaded constructor [attempts to obtain the
> *NON-EXISTENT* _fetcher.timelimit_ configuration
> property|https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/fetcher/FetchItemQueues.java#L84].
> {code:java}
> this.timelimit = conf.getLong("fetcher.timelimit", -1);
> {code}
> As you can see a default value of -1 is provided. The first parameter is
> however wrong. It should instead reference the following configuration
> property.
> {code:xml}
> <property>
> <name>fetcher.timelimit.mins</name>
> <value>-1</value>
> <description>This is the number of minutes allocated to the fetching.
> Once this value is reached, any remaining entry from the input URL list is
> skipped
> and all active queues are emptied. The default value of -1 deactivates the
> time limit.
> </description>
> </property>
> {code}
> Note, *_fetcher.timelimit.mins_*
> I think that this essentially means the Fetcher has no time limit which is
> ofcourse not desired.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)