igiguere opened a new pull request, #880: URL: https://github.com/apache/nutch/pull/880
### Ticket https://issues.apache.org/jira/browse/NUTCH-1564 ### Description For a full description of the issue, please refer to the ASF Jira ticket. ### Solution If the `offset` calculated from the `delta` (difference between last fetch time and last modification time) and `sync_delta_rate` is larger than the `max_interval`, then, the `offset` is re-calculated proportionaly to the `max_interval`. This ensures that when the `interval` (most likely the `max_interval`) is added to the `refTime`, the resulting new `fetchTime` is not is the past, triggering an immediate re-fetch. Note that I also played with some "brute force" ideas: - if `offset` > `max_interval`, then set `refTime` to current `fetchTime` - if `offset` > `max_interval`, then re-set `offset` to `offset` - `max_interval` (i.e.: 9-7=2), then, calculate `refTime` as before from that. (equivalent to `fetchTime` - 2, in the example) The suggested approach allows a smooth-ish selection of the next fetch time, relative to the gap between fetch time and last modification time. ### Tests Unit tests added, illustrating a few situations based on the description of NUTCH-1564. Functional tests to be done on a long-running installation... which I don't have. * Nutch is successfully built and unit tests pass by running `ant clean runtime test` * there should be no conflicts when merging the pull request branch into the *recent* master branch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulled master branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]

