lewismc commented on a change in pull request #724:
URL: https://github.com/apache/nutch/pull/724#discussion_r785373191
##########
File path: src/java/org/apache/nutch/fetcher/FetchItemQueues.java
##########
@@ -195,11 +195,15 @@ public synchronized FetchItem getFetchItem() {
return null;
}
+ public boolean timelimitReached() {
Review comment:
Maybe provide basic Javadoc?
##########
File path: src/java/org/apache/nutch/fetcher/FetchItemQueues.java
##########
@@ -263,6 +283,10 @@ public synchronized int checkExceptionThreshold(String
queueid) {
return 0;
}
+ public int checkExceptionThreshold(String queueid) {
Review comment:
Same here. Basic Javadoc?
##########
File path: src/java/org/apache/nutch/fetcher/FetcherThread.java
##########
@@ -600,6 +628,12 @@ private FetchItem queueRedirect(Text redirUrl, FetchItem
fit)
LOG.debug(" - ignoring redirect from {} to {} as duplicate", fit.url,
redirUrl);
return null;
+ } else if (fetchQueues.timelimitReached()) {
+ redirecting = false;
+ context.getCounter("FetcherStatus", "hitByTimeLimit").increment(1);
Review comment:
Same with this one
https://cwiki.apache.org/confluence/display/NUTCH/Metrics
##########
File path: src/java/org/apache/nutch/fetcher/FetcherThread.java
##########
@@ -312,6 +322,24 @@ public void run() {
outputRobotsTxt(robotsTxtContent);
robotsTxtContent.clear();
}
+ if (rules.isDeferVisits()) {
+ LOG.info("Defer visits for queue {} : {}", fit.queueID, fit.url);
+ // retry the fetch item
+ if (fetchQueues.timelimitReached()) {
+ fetchQueues.finishFetchItem(fit, true);
+ } else {
+ fetchQueues.addFetchItem(fit);
+ }
+ // but check whether it's time to cancel the queue
+ int killedURLs = fetchQueues.checkExceptionThreshold(
+ fit.getQueueID(), this.robotsDeferVisitsRetries + 1,
+ this.robotsDeferVisitsDelay);
+ if (killedURLs != 0) {
+ context.getCounter("FetcherStatus",
Review comment:
Can you please augment
https://cwiki.apache.org/confluence/display/NUTCH/Metrics
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]