sebastian-nagel commented on code in PR #880:
URL: https://github.com/apache/nutch/pull/880#discussion_r2671659078
##########
src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:
##########
@@ -389,7 +402,8 @@ public static void main(String[] args) throws Exception {
(p.getFetchInterval() / SECONDS_PER_DAY), miss);
if (p.getFetchTime() <= curTime) {
fetchCnt++;
- fs.setFetchSchedule(new Text("http://www.example.com"), p, p
+ // why was "http://www.example.com" hard-coded here?
Review Comment:
Likely, because a URL is required by the API, although it is not relevant
here. It's ok to use an empty string here. But the comment should be also
removed.
##########
src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:
##########
@@ -332,17 +333,29 @@ public CrawlDatum setFetchSchedule(Text url, CrawlDatum
datum,
case FetchSchedule.STATUS_UNKNOWN:
break;
}
- if (SYNC_DELTA) {
- // try to synchronize with the time of change
- long delta = (fetchTime - modifiedTime) / 1000L;
- if (delta > interval)
- interval = delta;
- refTime = fetchTime - Math.round(delta * SYNC_DELTA_RATE * 1000);
- }
// Ensure the interval does not fall outside of bounds
float minInterval = (getCustomMinInterval(url) != null) ?
getCustomMinInterval(url) : MIN_INTERVAL;
float maxInterval = (getCustomMaxInterval(url) != null) ?
getCustomMaxInterval(url) : MAX_INTERVAL;
+
+ if (SYNC_DELTA) {
+ // try to synchronize with the time of change
+ long delta = (fetchTime - modifiedTime);
+ if (delta > (interval * 1000))
+ interval = delta / 1000L;
+ // offset: a fraction (sync_delta_rate) of the difference between the
last modification time, and the last fetch time.
+ long offset = Math.round(delta * SYNC_DELTA_RATE);
+ long maxIntervalMillis = (long) maxInterval * 1000L;
+ LOG.trace("delta (days): " + Duration.ofMillis(delta).toDays()
Review Comment:
Especially for debug and trace logs, parameterized logging is recommended.
See the [slf4j FAQ about
performance](https://www.slf4j.org/faq.html#logging_performance).
However, because there are three Duration objects created, it's also ok to
put the log call into a `if (LOG.isTraceEnabled())` condition.
##########
src/java/org/apache/nutch/fetcher/FetcherThread.java:
##########
@@ -389,7 +389,7 @@ public void run() {
}
continue;
}
- if (!rules.isAllowed(fit.u)) {
Review Comment:
How is this change related to NUTCH-1564?
It reverts a change done in PR #874 / NUTCH-3136. Possibly a rebase issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]