[
https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741225#action_12741225
]
Doğacan Güney commented on NUTCH-721:
-
Thanks for the analysis, Julien! Can you make a
[
https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-721:
Attachment: NUTCH-721.patch
Sets the default value for fetcher.threads.per.host.by.ip to false
Hi Marko,
Thanks for the reply. Actually it was an Eclipse problem.
Its working fine with ant.
-Ran
Marko Bauhardt-3 wrote:
hi ran
do you have the log4j.properties file in your classpath?
marko
On Aug 7, 2009, at 9:18 PM, ranjeet98 wrote:
Hi,
I am very new to the Nutch and I
I was wondering why nutch was refetching pages that haven't changed in
a decade, when I discovered this code in
org.apache.nutch.protocol.http.HttpResponse.java:
if (datum.getModifiedTime() 0) {
reqStr.append(If-Modified-Since: +
HttpDateFormat.toString(datum.getModifiedTime()));
After applying the patch I sent earlier, I got it so that it correctly
skips downloading pages that haven't changed. And after doing the
generate/fetch/updatedb loop, and merging the segments with mergeseg,
dumping the segment file seems to show that it still has the old
content as well as the