Q: How to setup eclipse projects to acccess nutch?

2005-08-19 Thread Michael Scharf
Hi, I'd like take to look at nutch using eclipse. I have subclipse installed and I see the nutch repository: http://svn.apache.org/repos/asf/lucene/nutch It's not clear to me what to check out and how to setup eclipse projects Q: what to check out (to see the java source)? Q: are there

Re: svn.apache.org down?

2005-08-19 Thread Doug Cutting
Jérôme Charron wrote: svn.apache.org http://svn.apache.org down, or the problem is on my side? A good way to answer this is to look at: http://monitoring.apache.org/status/ It looks like SVN is currently up. And it works for me too. Doug

Failing JUnit test

2005-08-19 Thread Piotr Kosiorowski
Hello, I have updated my local copy today and JUnit tests started to fail. expected:el but was:sv junit.framework.ComparisonFailure: expected:el but was:sv at org.apache.nutch.analysis.lang.TestLanguageIdentifier.testIdentify(Unknown Source) at

Re: Failing JUnit test

2005-08-19 Thread Jérôme Charron
expected:el but was:sv junit.framework.ComparisonFailure: expected:el but was:sv As I suspect it is a result of latest updates to LanguageIdentifier plugin or its tests. I am not deep into it I will not try to debug it myslef at the moment - so just wanted you to know about the issue. You

Re: Failing JUnit test

2005-08-19 Thread Jérôme Charron
I am using JDK 1.5 on Windows - I can test it on 1.4,1.5 on linux tomorrow - maybe this is the problem. OK. Thanks Jérôme -- http://motrech.free.fr/ http://www.frutch.org/

[jira] Closed: (NUTCH-20) Extract urls from plain texts

2005-08-19 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-20?page=all ] Jerome Charron closed NUTCH-20: --- Fix Version: 0.8-dev Resolution: Fixed Revision 233559 - http://svn.apache.org/viewcvs.cgi?rev=233559view=rev * Add utility to extract urls from plain

MD5 in fetchlist / fetcher

2005-08-19 Thread Michael Ji
hi there, I dumped the contents in segment/fetchlist and segment/fetcher; My curious question is that: why MD5 signature of the page content doesn't save in fetchlist? In my mind, I think it will save CPU time if we see a page unchanged --- coz we can skip the parsing process; From my view, if