Thanks. I'll give this page a test.
What page sizes are you seeing the errors on? Ie what is your wordlist_page_size set to? Thanks again. On Wed, 19 Feb 2003, Lachlan Andrew wrote: > On Friday 14 February 2003 11:16, Neal Richter wrote: > > > Is there something you can tell us about the type of data you are > > indexing? Are they big pages with lots of repetitive information.. > > giving htdig many similar keys which hash/sort to the same pages? > > Greetings Neal, > > I've found one page in the qt documentation which may be causing > those problems (attached). I hadn't realised it, but the > valid_punctuation attribute seems to be treated as an *optional* > word break. (The docs say it is *not* a word break, and that seems > the intention of WordType::WordToken...) The page has long strings > with many valid_punctuation symbols, and gives output like > > elliptical 1060 0 1113 34 > elp 1363 0 131 0 > elphick 1516 0 750 0 > elsbs 1372 0 968 4 > elsbsw 1372 0 968 4 > elsbswp 1372 0 968 4 > elsbswpe 1372 0 968 4 > elsbswpew 1372 0 968 4 > elsbswpewg 1372 0 968 4 > elsbswpewgr 1372 0 968 4 > elsbswpewgrr 1372 0 968 4 > elsbswpewgrr1 1372 0 968 4 > elsbswpewgrr1t 1372 0 968 4 > elsbswpewgrr1twa7 1372 0 968 4 > elsbswpewgrr1twa7z 1372 0 968 4 > elsbswpewgrr1twa7z1bea0 1372 0 968 4 > elsbswpewgrr1twa7z1bea0f 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fk 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkd 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkdrbk 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkdrbke 1372 0 968 4 > elsbswpewgrr1twa7z1bea0fkdrbkezb 1372 0 968 4 > else 225 0 1285 0 > > Might that be the trouble? > > (BTW, zlib 1.1.4 is still giving errors, albeit for a slightly > different data set.) > > Cheers, > Lachlan Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 ------------------------------------------------------- This SF.net email is sponsored by: SlickEdit Inc. Develop an edge. The most comprehensive and flexible code editor you can use. Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial. www.slickedit.com/sourceforge _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
