> > After having looked at many commercial implementation of search engines > > over the past few years and following Nutch a bit.. I am still convinced > > that HtDig has plenty of legs. > > I know what you mean. Every time I look at Nutch I decide > to stick with htdig 3.1.6 a little longer. However, UTF-8 support > is getting super critical and some time in 2006 I'm going to have > to bite the bullet and do something.
Exactly the impetus for the 4.0 development. I need Unicode in 2006 as well. > Neal, are you tracking the Java Lucene dev lists? There's > some recent discussion with respect to index interoperability > that may be relevant. Not yet... just the Clucene list. I'll have a look. We have been able to verify that the Java Lucene tool 'luke' is able to read and query the indexes produced by CLucene. Very cool. The names of the searchable-fields we are using at this point is likely different than nutch. Might be worth a look to see how different. If you look at the 4.0 cvs branch, we've devised a pretty cool method of using an STL map container to hold the fieldname & fieldtext pairs with index/noindex and store/nostore flags. These are filled per document during htdig's parsing. It makes the htdig<->clucene interface very elegant. Thanks -- Neal Richter Sr. Researcher and Machine Learning Lead Software Development RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ ht://Dig Developer mailing list: htdig-dev@lists.sourceforge.net List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev