>may you will find that interesting also:
>http://maven.apache.org/using/multiproject.html

I'd rather suggest to support Apache HttpClient, huge amount of unnecessary
code could be easily removed from Nutch. We don't need to calculate "actual
URL" after redirecting, GetMethod does it all for us.

Using HTTP HEAD can improve performance; and many more staff. Google uses
HEAD method, I noticed from logs.

What about NekoHTML parser? getTextHelper method seems to be very strange,
Java 5 does it all (DOM level 3); new Parser plugin could be based on
http://htmlparser.sourceforge.net - and again we can remove buggy
getOutlinks().

I have experience with Maven, and CruiseControl. All Maven's staff
(checkstyle, javadoc, xdoc, developer's activity report, etc.) could be run
via ANT. Not a first priority...

Reply via email to