[ 
https://issues.apache.org/jira/browse/NUTCH-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma closed NUTCH-957.
-------------------------------


Bulk close of resolved issues for 1.3.

> fetcher.timelimit.mins is invalid when depth is greater than 1
> --------------------------------------------------------------
>
>                 Key: NUTCH-957
>                 URL: https://issues.apache.org/jira/browse/NUTCH-957
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.2
>         Environment: openSUSE 11.3, jdk-1.6, ant-1.8, tomcat-6.0, nutch-1.2
>            Reporter: Wade Lau
>             Fix For: 1.3
>
>
> The setting value of  fetcher.timelimit.mins will be invalid when runing 
> ./bin/nutch crawl with depth=n (n>1).
> The reason is that the value of fetcher.timelimit.mins has been reset in the 
> following paragraph ( org.apache.nutch.fetcher.Fetcher.java ), 
> long timelimit = getConf().getLong("fetcher.timelimit.mins", -1);
> if (timelimit != -1) {
>   timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000);
>   LOG.info("Fetcher Timelimit set for : " + timelimit);
>   getConf().setLong("fetcher.timelimit.mins", timelimit);
> }
> when the crawler goes down to next depth, the value will be the time value of 
> last one which is timelimit.mins + currentTimeMillis.
> Some logs look like:
> depth=1 
> Fetcher: starting at 2011-01-16 20:58:53
> Fetcher: segment: crawl/segments/20110116205851
> Fetcher Timelimit set for : 1295182793540 now is:[1295182733540] 
> timelimit:[1] new.sum:[1295182793540]
> depth=2
> Fetcher: starting at 2011-01-16 21:00:20
> Fetcher: segment: crawl/segments/20110116210018
> Fetcher Timelimit set for : 77712262795220167 now is:[1295182820167] 
> timelimit:[1295182793540] new.sum:[77712262795220167]
> The solution is easy to go as below:
> long timelimit = getConf().getLong("fetcher.timelimit.mins.init", -1);
> if( timelimit == -1)
> {
>     timelimit = getConf().getLong("fetcher.timelimit.mins", -1);
>     getConf().setLong("fetcher.timelimit.mins.init", timelimit);
> }
> if (timelimit != -1) {
>   timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000);
>   LOG.info("Fetcher Timelimit set for : " + timelimit);
>   getConf().setLong("fetcher.timelimit.mins", timelimit);
> }
> Hope  this will be helpful for the next release, and save time for others.
> refer:
> http://ufqi.com/exp/x1183.html?title=apache.nutch.timelimit.bug

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to