Hi,
I'd like take to look at nutch using eclipse. I have subclipse
installed and I see the nutch repository:
http://svn.apache.org/repos/asf/lucene/nutch
It's not clear to me what to check out and how
to setup eclipse projects
Q: what to check out (to see the java source)?
Q: are there
Jérôme Charron wrote:
svn.apache.org http://svn.apache.org down, or the problem is on my side?
A good way to answer this is to look at:
http://monitoring.apache.org/status/
It looks like SVN is currently up. And it works for me too.
Doug
Hello,
I have updated my local copy today and JUnit tests started to fail.
expected:el but was:sv
junit.framework.ComparisonFailure: expected:el but was:sv
at
org.apache.nutch.analysis.lang.TestLanguageIdentifier.testIdentify(Unknown
Source)
at
expected:el but was:sv
junit.framework.ComparisonFailure: expected:el but was:sv
As I suspect it is a result of latest updates to LanguageIdentifier
plugin or its tests. I am not deep into it I will not try to debug it
myslef at the moment - so just wanted you to know about the issue.
You
I am using JDK 1.5 on
Windows - I can test it on 1.4,1.5 on linux tomorrow - maybe this is the
problem.
OK. Thanks
Jérôme
--
http://motrech.free.fr/
http://www.frutch.org/
[ http://issues.apache.org/jira/browse/NUTCH-20?page=all ]
Jerome Charron closed NUTCH-20:
---
Fix Version: 0.8-dev
Resolution: Fixed
Revision 233559 - http://svn.apache.org/viewcvs.cgi?rev=233559view=rev
* Add utility to extract urls from plain
hi there,
I dumped the contents in segment/fetchlist and
segment/fetcher;
My curious question is that: why MD5 signature of the
page content doesn't save in fetchlist?
In my mind, I think it will save CPU time if we see a
page unchanged --- coz we can skip the parsing
process; From my view, if