[jira] Commented: (NUTCH-721) Fetcher2 Slow

2009-08-10 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741225#action_12741225 ] Doğacan Güney commented on NUTCH-721: - Thanks for the analysis, Julien! Can you make a

[jira] Updated: (NUTCH-721) Fetcher2 Slow

2009-08-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-721: Attachment: NUTCH-721.patch Sets the default value for fetcher.threads.per.host.by.ip to false

Re: How to see System.out.println() values Featcher.java

2009-08-10 Thread ranjeet98
Hi Marko, Thanks for the reply. Actually it was an Eclipse problem. Its working fine with ant. -Ran Marko Bauhardt-3 wrote: hi ran do you have the log4j.properties file in your classpath? marko On Aug 7, 2009, at 9:18 PM, ranjeet98 wrote: Hi, I am very new to the Nutch and I

Is this a bug?

2009-08-10 Thread Paul Tomblin
I was wondering why nutch was refetching pages that haven't changed in a decade, when I discovered this code in org.apache.nutch.protocol.http.HttpResponse.java: if (datum.getModifiedTime() 0) { reqStr.append(If-Modified-Since: + HttpDateFormat.toString(datum.getModifiedTime()));

Why isn't this working?

2009-08-10 Thread Paul Tomblin
After applying the patch I sent earlier, I got it so that it correctly skips downloading pages that haven't changed. And after doing the generate/fetch/updatedb loop, and merging the segments with mergeseg, dumping the segment file seems to show that it still has the old content as well as the