According to Philippe Ramkvist-Henry:
> I'm having problems with some foreign chars when using htdig to index and
> search a Swedish site. The locale is set right (sv) and is working in
> other applications. The problem I have is somewhat weird, maybe it has
> something to do with "uppercase" "lowercase"?
> 
> Well, I can search words like "�sa,�sa,�l,�l" and get the same matches.
> But when I try to search "b�st" I get no hits. With "b�st" I get several
> hits...

Are the hits all capitalized, or do some of them have the lowercase �?
Does this problem happen consistently with certain accented letters, and
not others?  Do you have certain uppercase letters appearing in db.wordlist?

> I asked a guy here a the University and he said that there might be
> complications with "unsigned char" and "char". He gave me the example
> below. Please answer at a novice level, my C++ and Unix knowledge is very
> limited.  

Good hunch, but given that some accented letters work and some give
problems, I wouldn't expect that it's a problem with sign extension.
This seems to point to a problem with the ctype tables for your locale,
but there could be something else that I'm missing here.  Please keep
us posted.

>  htlib/StringMatch.cc
>  
>      while ((unsigned char)string[pos])
>      {
>          new_state = table[trans[string[pos]]][state];
>          
> Should be? or? 
>  
>      while (string[pos])

You don't need to take off the type cast on the "while" condition above,
but the trans[] array subscript below definitely should be type cast!
I'll fix this in the source.  However, this seems to be a problem only
in the StringMatch::Compare() method, which isn't used for looking at
words in documents or in the database.  It only affects a few internal
ASCII-only string matches, and the robots.txt disallow comparisons, so
unless you use upper-half characters in URLs, this bug shouldn't be a
problem (which explains how it's evaded detection this long).

>      {
>          new_state = table[trans[(unsigned char)string[pos]]][state];

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You'll receive a message confirming the unsubscription.

Reply via email to