Re: [htdig-dev] Checking in

Neal Richter Thu, 20 Oct 2005 14:22:28 -0700


> > After having looked at many commercial implementation of search engines
> > over the past few years and following Nutch a bit.. I am still convinced
> > that HtDig has plenty of legs.
> 
> I know what you mean. Every time I look at Nutch I decide
> to stick with htdig 3.1.6 a little longer. However, UTF-8 support
> is getting super critical and some time in 2006 I'm going to have
> to bite the bullet and do something.


  Exactly the impetus for the 4.0 development.  I need Unicode in 2006 as 
well.
 
> Neal, are you tracking the Java Lucene dev lists? There's
> some recent discussion with respect to index interoperability
> that may be relevant.

  Not yet... just the Clucene list.  I'll have a look.

  We have been able to verify that the Java Lucene tool 'luke' is able to 
read and query the indexes produced by CLucene.  Very cool.

  The names of the searchable-fields we are using at this point is likely 
different than nutch.  Might be worth a look to see how different.

  If you look at the 4.0 cvs branch, we've devised a pretty cool method of 
using an STL map container to hold the fieldname & fieldtext pairs with 
index/noindex and store/nostore flags.  These are filled per document 
during htdig's parsing.

  It makes the htdig<->clucene interface very elegant.

  Thanks

-- 
Neal Richter
Sr. Researcher and Machine Learning Lead
Software Development
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
ht://Dig Developer mailing list:
[email protected]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Re: [htdig-dev] Checking in

Reply via email to