> Are the hits all capitalized, or do some of them have the lowercase �?
> Does this problem happen consistently with certain accented letters, and
> not others? Do you have certain uppercase letters appearing in db.wordlist?
With hits you mean the actual words from the document I guess. Well only those
which are supposed to be capitalized are. For example: A search for "�ttestupan"
renders 0 hits while a search for "�ttestupan" renders 18. The word is in the documents
always written as "�ttestupan" so this would be natural if the search was case
sensitive.
The problem is that "�sa" and "�sa" gives the exact same hits and it's also always
reffered to as "�sa". The problem only exists (as far as I can test) for "��".
The db.wordlist only contain lowercase letters.
> > I asked a guy here a the University and he said that there might be
> > complications with "unsigned char" and "char". He gave me the example
> > below. Please answer at a novice level, my C++ and Unix knowledge is very
> > limited.
>
> Good hunch, but given that some accented letters work and some give
> problems, I wouldn't expect that it's a problem with sign extension.
> This seems to point to a problem with the ctype tables for your locale,
> but there could be something else that I'm missing here. Please keep
> us posted.
I'm also looking for a synonym wordlist in swedish... If anyone has one, please
send me a copy.
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You'll receive a message confirming the unsubscription.