> > Neal, are you tracking the Java Lucene dev lists? There's > > some recent discussion with respect to index interoperability > > that may be relevant. > > Not yet... just the Clucene list. I'll have a look.
Here's some starting points maybe worth half an eyeball: The UTF-8 interoperability thread http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html Interoperability with Perl Lucene http://www.mail-archive.com/java-dev@lucene.apache.org/msg02187.html Features in the approaching Java Lucene 1.9 http://www.mail-archive.com/java-dev@lucene.apache.org/msg02284.html Debian & Kaffe, Redhat & GCJ http://www.mail-archive.com/java-dev@lucene.apache.org/msg02092.html > We have been able to verify that the Java Lucene tool 'luke' is able to > read and query the indexes produced by CLucene. Very cool. > > The names of the searchable-fields we are using at this point is likely > different than nutch. Might be worth a look to see how different. As of Nutch 0.7.1, the crawler + indexer is getting close. If it had an easy to configure equivalent to HtDig's "local_urls" and "<!--htdig_noindex-->" features I think it would probably be good enough. Running Java for these operations does not feel like such a big deal, and maybe there would be GCJ magic to ease the pain. The search portion is a different story and requiring Tomcat is kind of a pain in the butt. If some miracle occurred and htdig 4.0 and nutch were super-compatible, I could imagine wanting to use htsearch against a nutch built index. Dropping a search program into cgi-bin is really convenient. > If you look at the 4.0 cvs branch, we've devised a pretty cool method of > using an STL map container to hold the fieldname & fieldtext pairs with > index/noindex and store/nostore flags. These are filled per document > during htdig's parsing. > > It makes the htdig<->clucene interface very elegant. I'm a straight C guy, so STL is a little beyond me. But I like the sound of elegant and am tracking the blog. ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ ht://Dig Developer mailing list: htdig-dev@lists.sourceforge.net List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev