4) According to the PyLucene author, converting Java Lucene to a
  native library with gcj, then calling that library from a C
  program is hopelessly hairy and not recommended. Too bad.

5) To my eye, Nutch does not look particularly rich in features or
  configurability compared to HtDig.

True, but it's known 200+ million document scalability can't be beat. And it's being supported by Yahoo Labs.


6) Word on the street is Xapian is the only competition to Lucene
  in terms of scalability in terms of Free Software search
  cores. Gmane uses Xapian against 20+ million documents.

Xapian is GPL, Lucene/CLucene is LGPL. Evidently the Xapian people didn't read the 4th paragraph of http://www.gnu.org/philosophy/why-not-lgpl.html


"Using the ordinary GPL is not advantageous for every library. There are
reasons that can make it better to use the Library GPL in certain cases. The most common case is when a free library's features are readily available for proprietary software through other alternative libraries. In that case, the library cannot give free software any particular advantage, so it is better to use the Library GPL for that library."


  Of course you (Jeff) and I dissagreed on this point a while back ;-)

  That said Xapian does look impressive.

Anyway, I'm delighted to hear about this HtDig/Lucene experiment.
Points #1, #2, and #3 suggest it may make sense to consider the idea
of a pure Java HtDig which can be gcj compiled to native executables. From
my perspective as a naive HtDig user I think that would rock, but
there's probably lots of stuff I'm not thinking about. If anyone wants
to try out the gcj/Lucene thing Doug Cutting's instructions [*] work
fine provided you have gcj 3.4.x installed.

If we really wanted a pure Java HtDig, I think we'd be better off throwing in with Nutch and adding the configurability of HtDig to it.


As I see it, the primary reason that Nutch is somewhat unattractive to the average HtDig user is that they must know how to configure Nutch to run as Tomcat service, or know how to tweak the build system to build as a standalone server. Either is easy for a more novice user given their current build system and 'How-To' docs.

HtDig is still a forked CGI app, which means that our users don't have to worry about starting/monitoring a server daemon. If we were to throw in with Nutch at some future date, it would be nice to make a simple option for Nutch to be built as a forked CGI app.

I've looked at attempting to go the PyLucene route and compile Java with gjc and create the hairy wrapper libs for it. It is ugly for many reasons.

Going with CLucene at first has the advantage that we can get the code reorg done, and look at replacing the CLucene APIs with the equivalent Java-Lucene+Wrapper ones.. if it is even worth doing that.

  Thanks.

--
Neal Richter Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
ht://Dig Developer mailing list:
htdig-dev@lists.sourceforge.net
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to