> I'm starting a proof of concept conversion to use CLucene to >replace the db.words.db ... it will still use BDB for the other db >files.
Let me quickly report what I've learned as the Debian maintainer for Java Lucene for the past 6 months. 0) Free software Java platforms like Kaffe can't build or run Java Lucene yet. So the toolchain is still linked to Sun's proprietary compiler/JVM. But it's fairly close and getting closer. Debian cares about this, may be less relevant for other platforms. 1) Java Lucene is a pretty quickly moving target. Point releases are every couple months, typically break a few things, and there is a big revision in the works towards Lucene 2.0. Lucene just got approved as a top level Apache Foundation project and the momentum is huge. 2) Language ports like CLucene and C# Lucene seem to be lagging somewhat. The author of PyLucene told me his (quite convoluted) build process starts with Java Lucene instead of CLucene for exactly for this reason. 3) A Java application using Java Lucene can be compiled to native code trivially using gcj, at least on Linux. Compilation is a one liner and the resulting native binary is really fast. I'm thinking about shipping gcj compiled binaries for the Lucene demo programs with Debian. 4) According to the PyLucene author, converting Java Lucene to a native library with gcj, then calling that library from a C program is hopelessly hairy and not recommended. Too bad. 5) To my eye, Nutch does not look particularly rich in features or configurability compared to HtDig. 6) Word on the street is Xapian is the only competition to Lucene in terms of scalability in terms of Free Software search cores. Gmane uses Xapian against 20+ million documents. Anyway, I'm delighted to hear about this HtDig/Lucene experiment. Points #1, #2, and #3 suggest it may make sense to consider the idea of a pure Java HtDig which can be gcj compiled to native executables. From my perspective as a naive HtDig user I think that would rock, but there's probably lots of stuff I'm not thinking about. If anyone wants to try out the gcj/Lucene thing Doug Cutting's instructions [*] work fine provided you have gcj 3.4.x installed. [*] http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg09089.html Cheers, Jeff ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ ht://Dig Developer mailing list: htdig-dev@lists.sourceforge.net List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev