Sami Siren wrote:
Hello,
It has been a while from a previous release (0.8.1) and looking at the
great fixes done in trunk I'd start thinking about baking a new release
soon.
Looking at the jira roadmaps there are 1 blocking issues (fixing the
license headers) for 0.8.2 and two other blocking issues for 0.9.0 of
which I think NUTCH-233 is safe to put in.
Agreed. The replacement regex mentioned in the original comment seems
safe enough, and simpler.
The top 10 voted issues are currently:
NUTCH-61 Adaptive re-fetch interval. Detecting umodified content
Well ... I'm of a split mind on this. I can bring this patch up to date
and apply it before 0.9.0, if we understand that this is a "0" release
... ;) Otherwise I'd prefer to wait with it right after the release.
I would like also to proceed with NUTCH-339 (Fetcher2 patches + plus
some changes I made in the meantime), since I'd like to expose the new
fetcher to a broader audience, and it doesn't affect the existing
implementation.
NUTCH-48 "Did you mean" query enhancement/refignment feature
NUTCH-251 Administration GUI
NUTCH-289 CrawlDatum should store IP address
I'm still not entirely convinced about this - and there is already a
mechanism in place to support it if someone really wishes to keep this
particular info (CrawlDatum.metaData).
NUTCH-36 Chinese in Nutch
NUTCH-185 XMLParser is configurable xml parser plugin.
NUTCH-59 meta
data support in webdb
NUTCH-92 DistributedSearch incorrectly scores results
NUTCH-68
This is too intrusive to fix just before the release - and needs
additional discussion.
NUTCH-68 A
tool to generate arbitrary fetchlists
Easy to port this to 0.9.0 - I can do this.
NUTCH-87 Efficient
site-specific crawling for a large number of sites
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com