On Thu, 19 Sep 2002, Gilles Detillieux wrote: > According to Neal Richter: > > 1. Add a new config verb to let users use zlib WordDB-page compression. > > This would be an option to let users who run into this error: > > > > FATAL ERROR:Compressor::get_vals invalid comptype > > FATAL ERROR at file:WordBitCompress.cc line:827 !!! > > > > If you look into the db/mp_cmpr.c code (Loic's Compressed BDB page code) > > you'll find these two functions: > > CDB___memp_cmpr_inflate(..) > > CDB___memp_cmpr_defalte(...) > ... > > Merging Loic's latest mifluz is supposed to fix this problem (Geoff > > and I have been working on this), but so far the merge is fairly complex > > and needs much more work and long term testing. This is a decent > > solution. > > Sounds reasonable as an interim solution. I wonder, though, if > it wouldn't be a quicker/easier fix to backport just the inflate and > deflate code from the latest mifluz package to the existing 3.2.0b4 code. > Would that fix this particular problem without all the headaches of > merging in all the latest mifluz code?
I tried to do just that (independent of Geoff). Unfortunately Loic basically reimplemented and restructured so much code that its very hard to divide the merge. > > > 2. The inverted index is not very efficient in general. > > > > The current scheme: > > > > WORD DOCID LOCATION > > affect 323 43 > > affect 323 53 > ... > > A more efficient inverted system > > > > affect 323 43, 53 > ... > > If the fixed width Location field was around 256 characters, this would > > allow roughly 40-50 1,2,3 & 4 digit location codes... likely resulting the > > vast majority of the time a second row is not needed. For large > > documents, this would change but still be much more efficient. > > > > Eh? Feedback? > > Sounds like an excellent idea to me. I'm rather surprised they didn't do > that in mifluz already (or is something like this in the newer code?). > This does mean a deviation from the mifluz code base, but it seems > that's inevitable anyway, given the efforts to crowbar the latest code > into ht://Dig, and the lack of support from the mifluz developers. I'm kind of being conservative. Based on the total lack of recent progress in mifluz, the very quite mailing list, and the much smaller user base I have some worries about just how good the new mifluz code is. I would like to see parallel development for a while till we get the mifluz-merge tree VERY solid. Maybe even finish 3.2 without the merge and start the merge in 3.3? > I guess it also means making the change twice - once in the current > ht://Dig code and again after the mifluz code merge. Or is all this > at a level that can be done with minimal changes after the merge? Probably both, but the second port should be pretty straightforward. -- Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev