[
https://issues.apache.org/jira/browse/NUTCH-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma closed NUTCH-957.
-------------------------------
Bulk close of resolved issues for 1.3.
> fetcher.timelimit.mins is invalid when depth is greater than 1
> --------------------------------------------------------------
>
> Key: NUTCH-957
> URL: https://issues.apache.org/jira/browse/NUTCH-957
> Project: Nutch
> Issue Type: Bug
> Components: fetcher
> Affects Versions: 1.2
> Environment: openSUSE 11.3, jdk-1.6, ant-1.8, tomcat-6.0, nutch-1.2
> Reporter: Wade Lau
> Fix For: 1.3
>
>
> The setting value of fetcher.timelimit.mins will be invalid when runing
> ./bin/nutch crawl with depth=n (n>1).
> The reason is that the value of fetcher.timelimit.mins has been reset in the
> following paragraph ( org.apache.nutch.fetcher.Fetcher.java ),
> long timelimit = getConf().getLong("fetcher.timelimit.mins", -1);
> if (timelimit != -1) {
> timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000);
> LOG.info("Fetcher Timelimit set for : " + timelimit);
> getConf().setLong("fetcher.timelimit.mins", timelimit);
> }
> when the crawler goes down to next depth, the value will be the time value of
> last one which is timelimit.mins + currentTimeMillis.
> Some logs look like:
> depth=1
> Fetcher: starting at 2011-01-16 20:58:53
> Fetcher: segment: crawl/segments/20110116205851
> Fetcher Timelimit set for : 1295182793540 now is:[1295182733540]
> timelimit:[1] new.sum:[1295182793540]
> depth=2
> Fetcher: starting at 2011-01-16 21:00:20
> Fetcher: segment: crawl/segments/20110116210018
> Fetcher Timelimit set for : 77712262795220167 now is:[1295182820167]
> timelimit:[1295182793540] new.sum:[77712262795220167]
> The solution is easy to go as below:
> long timelimit = getConf().getLong("fetcher.timelimit.mins.init", -1);
> if( timelimit == -1)
> {
> timelimit = getConf().getLong("fetcher.timelimit.mins", -1);
> getConf().setLong("fetcher.timelimit.mins.init", timelimit);
> }
> if (timelimit != -1) {
> timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000);
> LOG.info("Fetcher Timelimit set for : " + timelimit);
> getConf().setLong("fetcher.timelimit.mins", timelimit);
> }
> Hope this will be helpful for the next release, and save time for others.
> refer:
> http://ufqi.com/exp/x1183.html?title=apache.nutch.timelimit.bug
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira