According to Philippe Ramkvist-Henry:
> > Are the hits all capitalized, or do some of them have the lowercase �?
> > Does this problem happen consistently with certain accented letters, and
> > not others?  Do you have certain uppercase letters appearing in db.wordlist?
> 
> With hits you mean the actual words from the document I guess. Well only those 
> which are supposed to be capitalized are. For example: A search for "�ttestupan" 
> renders 0 hits while a search for "�ttestupan" renders 18. The word is in the 
>documents
> always written as "�ttestupan" so this would be natural if the search was case 
>sensitive.
> The problem is that "�sa" and "�sa" gives the exact same hits and it's also always 
> reffered to as "�sa". The problem only exists (as far as I can test) for "��".
> 
> The db.wordlist only contain lowercase letters.

OK, so the word �ttestupan appears in there as �ttestupan, correct?
Very strange.  So searches for words containing � will find words with
� in its place, as expected, but searches for words containing � will
match neither � nor �, is that right?  I'm at a bit of a loss to explain
it, but at some point it would seem that htsearch is mangling the lower
case �.  Do you have any documents containing a lower case � somewhere
in a word, and if so, does that word make it into db.wordlist correctly?

I still suspect a problem with ctype for your locale.  Could you compile
and run the following C program on your system, and send me the output?
(Run it with the name of your locale, "sv", as an argument.)

Does using a locale of sv_SE (or even something else entirely like fr or
fr_FR) make any difference in your results?  And for the long-shot question,
do are your documents use ISO 8859-1 (Latin 1) encoding, or are there some
that use a 7-bit encoding for Sweden?

-----------------------
#include <ctype.h>
#include <locale.h>

main(int ac, char **av)
{
        int             i;
        unsigned char   c;

        if (ac > 1) setlocale(LC_ALL, av[1]);

        for (i = 0; i < 256; ++i) {
                printf("%3d 0x%02X: ", i, i);
                c = i;
                if (isprint(c))
                        printf(" %c", c);
                else if (c < 0x80 && isprint(c ^ '@'))
                        printf("^%c", c ^ '@');
                else if (isprint((c & 0x7F) ^ '@'))
                        printf("~%c", (c & 0x7F) ^ '@');
                else
                        printf("  ");
                printf("  %c%c%c%c%c%c%c%c%c%c%c%c%c\n",
                        isascii(c)  ? 'A' : '-',
                        isalpha(c)  ? 'a' : '-',
                        islower(c)  ? 'l' : '-',
                        isupper(c)  ? 'u' : '-',
                        isalnum(c)  ? 'n' : '-',
                        isdigit(c)  ? 'd' : '-',
                        isxdigit(c) ? 'x' : '-',
                        isgraph(c)  ? 'g' : '-',
                        isprint(c)  ? 't' : '-',
                        ispunct(c)  ? 'p' : '-',
                        iscntrl(c)  ? 'c' : '-',
                        isspace(c)  ? 's' : '-',
#ifdef  isblank
                        isblank(c)  ? 'b' : '-'
#else
                        '?'
#endif
                        );
        }
}
-----------------------

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You'll receive a message confirming the unsubscription.

Reply via email to