According to Ravi Starzl:
> I have been working with htDig for almost a year now, and see a dramatic 
> difference in the performance between version 3.1.x and 3.2.x. I am 
> currently forced to use 3.2.x, because phrase searching is a necessity 
> for the projects that I work on, however, the performance handicaps of 
> 3.2.x (indexing times and htmerge instability) make using it very time 
> consuming for large-scale indexes (over 50,000 documents).  How 
> difficult or how far away is the addition of phrase searching to the 
> stable release? That is the key feature for several people I have talked 
> to that would be the biggest improvement in the stable release. I've 
> worked somewhat-extensively with other information retrieval systems - 
> would it be easier to use the phrase search code of another public 
> system as a template for implementing phrase capability?
> 
> I would love to contribute to the development effort if I could get some 
> specific direction on what would need changing in htDig to enable 
> phrases.

I think the question would more appropriately be posed "how far away
is the 3.2.x code from being stable?"  Backporting phrase matching to
the 3.1.x branch is absolutely out of the question.  In order to support
phrase matching, the database stucture had to be completely revamped in
the transition from 3.1 to 3.2.  Much of the inefficiency in 3.2 is
directly as a result of those changes.

So, if you'd like to contribute to 3.2 development, the most pressing
need is to merge in the latest mifluz code, which supports the new
word database format.  That should take care of some of the instability
and probably some of the inefficiency.  Next would be to optimize htmerge's
handling of the wordlist, using the new database walking capabilities in
the latest mifluz, so it doesn't try to store the whole database in
memory at once (that's the main reason htmerge is unstable right now).
Finally, we need to port many of 3.1.6's new features over to 3.2, and
resume work on the to-do list for 3.2.

Probably the best way to get up to speed on all this would be to study
the latest 3.2.0b4 development snapshot to see how things work right
now, and to review the htdig-dev mailing list archives to see what issues
have been discussed in the past while by the developers.


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to