> > Neal, are you tracking the Java Lucene dev lists? There's
> > some recent discussion with respect to index interoperability
> > that may be relevant.
>
>   Not yet... just the Clucene list.  I'll have a look.

Here's some starting points maybe worth half an eyeball:

The UTF-8 interoperability thread
http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html

Interoperability with Perl Lucene
http://www.mail-archive.com/java-dev@lucene.apache.org/msg02187.html

Features in the approaching Java Lucene 1.9
http://www.mail-archive.com/java-dev@lucene.apache.org/msg02284.html

Debian & Kaffe, Redhat & GCJ
http://www.mail-archive.com/java-dev@lucene.apache.org/msg02092.html

>   We have been able to verify that the Java Lucene tool 'luke' is able to
> read and query the indexes produced by CLucene.  Very cool.
>
>   The names of the searchable-fields we are using at this point is likely
> different than nutch.  Might be worth a look to see how different.

As of Nutch 0.7.1, the crawler + indexer is getting close. If it had
an easy to configure equivalent to HtDig's "local_urls" and
"<!--htdig_noindex-->"  features I think it would probably be good
enough. Running Java for these operations does not feel like such
a big deal, and maybe there would be GCJ magic to ease the pain.

The search portion is a different story and requiring Tomcat is kind of
a pain in the butt. If some miracle occurred and htdig 4.0 and nutch
were super-compatible, I could imagine wanting to use htsearch against
a nutch built index. Dropping a search program into cgi-bin is really
convenient.

>   If you look at the 4.0 cvs branch, we've devised a pretty cool method of
> using an STL map container to hold the fieldname & fieldtext pairs with
> index/noindex and store/nostore flags.  These are filled per document
> during htdig's parsing.
>
>   It makes the htdig<->clucene interface very elegant.

I'm a straight C guy, so STL is a little beyond me. But I like the sound
of elegant and am tracking the blog.


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
ht://Dig Developer mailing list:
htdig-dev@lists.sourceforge.net
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to