>        I'm starting a proof of concept conversion to use CLucene to
>replace the db.words.db ... it will still use BDB for the other db
>files.

Let me quickly report what I've learned as the Debian maintainer for
Java Lucene for the past 6 months.

0) Free software Java platforms like Kaffe can't build or run Java
   Lucene yet. So the toolchain is still linked to Sun's proprietary
   compiler/JVM. But it's fairly close and getting closer. Debian
   cares about this, may be less relevant for other platforms.

1) Java Lucene is a pretty quickly moving target. Point releases are
   every couple months, typically break a few things, and there is 
   a big revision in the works towards Lucene 2.0. Lucene just got
   approved as a top level Apache Foundation project and the momentum
   is huge.

2) Language ports like CLucene and C# Lucene seem to be lagging
   somewhat. The author of PyLucene told me his (quite convoluted)
   build process starts with Java Lucene instead of CLucene for
   exactly for this reason.

3) A Java application using Java Lucene can be compiled to native 
   code trivially using gcj, at least on Linux. Compilation is a 
   one liner and the resulting native binary is really fast. I'm
   thinking about shipping gcj compiled binaries for the Lucene
   demo programs with Debian.

4) According to the PyLucene author, converting Java Lucene to a 
   native library with gcj, then calling that library from a C
   program is hopelessly hairy and not recommended. Too bad.

5) To my eye, Nutch does not look particularly rich in features or 
   configurability compared to HtDig.

6) Word on the street is Xapian is the only competition to Lucene 
   in terms of scalability in terms of Free Software search 
   cores. Gmane uses Xapian against 20+ million documents.

Anyway, I'm delighted to hear about this HtDig/Lucene experiment.
Points #1, #2, and #3 suggest it may make sense to consider the idea
of a pure Java HtDig which can be gcj compiled to native executables. From
my perspective as a naive HtDig user I think that would rock, but
there's probably lots of stuff I'm not thinking about. If anyone wants
to try out the gcj/Lucene thing Doug Cutting's instructions [*] work
fine provided you have gcj 3.4.x installed.

[*] http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg09089.html

Cheers,
Jeff


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
ht://Dig Developer mailing list:
htdig-dev@lists.sourceforge.net
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to