sebastian-nagel commented on code in PR #880:
URL: https://github.com/apache/nutch/pull/880#discussion_r2671659078


##########
src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:
##########
@@ -389,7 +402,8 @@ public static void main(String[] args) throws Exception {
           (p.getFetchInterval() / SECONDS_PER_DAY), miss);
       if (p.getFetchTime() <= curTime) {
         fetchCnt++;
-        fs.setFetchSchedule(new Text("http://www.example.com";), p, p
+        // why was "http://www.example.com"; hard-coded here?

Review Comment:
   Likely, because a URL is required by the API, although it is not relevant 
here. It's ok to use an empty string here. But the comment should be also 
removed.



##########
src/java/org/apache/nutch/crawl/AdaptiveFetchSchedule.java:
##########
@@ -332,17 +333,29 @@ public CrawlDatum setFetchSchedule(Text url, CrawlDatum 
datum,
       case FetchSchedule.STATUS_UNKNOWN:
         break;
       }
-      if (SYNC_DELTA) {
-        // try to synchronize with the time of change
-        long delta = (fetchTime - modifiedTime) / 1000L;
-        if (delta > interval)
-          interval = delta;
-        refTime = fetchTime - Math.round(delta * SYNC_DELTA_RATE * 1000);
-      }
 
       // Ensure the interval does not fall outside of bounds
       float minInterval = (getCustomMinInterval(url) != null) ? 
getCustomMinInterval(url) : MIN_INTERVAL;
       float maxInterval = (getCustomMaxInterval(url) != null) ? 
getCustomMaxInterval(url) : MAX_INTERVAL;
+      
+      if (SYNC_DELTA) {
+        // try to synchronize with the time of change
+        long delta = (fetchTime - modifiedTime);
+        if (delta > (interval * 1000))
+          interval = delta / 1000L;
+        // offset: a fraction (sync_delta_rate) of the difference between the 
last modification time, and the last fetch time.
+        long offset = Math.round(delta * SYNC_DELTA_RATE);
+        long maxIntervalMillis = (long) maxInterval * 1000L;
+        LOG.trace("delta (days): " + Duration.ofMillis(delta).toDays() 

Review Comment:
   Especially for debug and trace logs, parameterized logging is recommended. 
See the [slf4j FAQ about 
performance](https://www.slf4j.org/faq.html#logging_performance).
   
   However, because there are three Duration objects created, it's also ok to 
put the log call into a `if (LOG.isTraceEnabled())` condition.



##########
src/java/org/apache/nutch/fetcher/FetcherThread.java:
##########
@@ -389,7 +389,7 @@ public void run() {
               }
               continue;
             }
-            if (!rules.isAllowed(fit.u)) {

Review Comment:
   How is this change related to NUTCH-1564?
   
   It reverts a change done in PR #874 / NUTCH-3136. Possibly a rebase issue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to