Thanks.  I'll give this page a test.

What page sizes are you seeing the errors on?  Ie what is your
wordlist_page_size set to?

Thanks again.

On Wed, 19 Feb 2003, Lachlan Andrew wrote:

> On Friday 14 February 2003 11:16, Neal Richter wrote:
>
> > Is there something you can tell us about the type of data you are
> > indexing?  Are they big pages with lots of repetitive information..
> > giving htdig many similar keys which hash/sort to the same pages?
>
> Greetings Neal,
>
> I've found one page in the  qt  documentation which may be causing
> those problems (attached).  I hadn't realised it, but the
> valid_punctuation  attribute seems to be treated as an *optional*
> word break.  (The docs say it is *not* a word break, and that seems
> the intention of  WordType::WordToken...)  The page has long strings
> with many valid_punctuation symbols, and gives output like
>
> elliptical    1060    0       1113    34
> elp   1363    0       131     0
> elphick       1516    0       750     0
> elsbs 1372    0       968     4
> elsbsw        1372    0       968     4
> elsbswp       1372    0       968     4
> elsbswpe      1372    0       968     4
> elsbswpew     1372    0       968     4
> elsbswpewg    1372    0       968     4
> elsbswpewgr   1372    0       968     4
> elsbswpewgrr  1372    0       968     4
> elsbswpewgrr1 1372    0       968     4
> elsbswpewgrr1t        1372    0       968     4
> elsbswpewgrr1twa7     1372    0       968     4
> elsbswpewgrr1twa7z    1372    0       968     4
> elsbswpewgrr1twa7z1bea0       1372    0       968     4
> elsbswpewgrr1twa7z1bea0f      1372    0       968     4
> elsbswpewgrr1twa7z1bea0fk     1372    0       968     4
> elsbswpewgrr1twa7z1bea0fkd    1372    0       968     4
> elsbswpewgrr1twa7z1bea0fkdrbk 1372    0       968     4
> elsbswpewgrr1twa7z1bea0fkdrbke        1372    0       968     4
> elsbswpewgrr1twa7z1bea0fkdrbkezb      1372    0       968     4
> else  225     0       1285    0
>
> Might that be the trouble?
>
> (BTW, zlib 1.1.4 is still giving errors, albeit for a slightly
> different data set.)
>
> Cheers,
> Lachlan

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




-------------------------------------------------------
This SF.net email is sponsored by: SlickEdit Inc. Develop an edge.
The most comprehensive and flexible code editor you can use.
Code faster. C/C++, C#, Java, HTML, XML, many more. FREE 30-Day Trial.
www.slickedit.com/sourceforge
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to