According to Philippe Ramkvist-Henry:
> > Are the hits all capitalized, or do some of them have the lowercase �?
> > Does this problem happen consistently with certain accented letters, and
> > not others? Do you have certain uppercase letters appearing in db.wordlist?
>
> With hits you mean the actual words from the document I guess. Well only those
> which are supposed to be capitalized are. For example: A search for "�ttestupan"
> renders 0 hits while a search for "�ttestupan" renders 18. The word is in the
>documents
> always written as "�ttestupan" so this would be natural if the search was case
>sensitive.
> The problem is that "�sa" and "�sa" gives the exact same hits and it's also always
> reffered to as "�sa". The problem only exists (as far as I can test) for "��".
>
> The db.wordlist only contain lowercase letters.
OK, so the word �ttestupan appears in there as �ttestupan, correct?
Very strange. So searches for words containing � will find words with
� in its place, as expected, but searches for words containing � will
match neither � nor �, is that right? I'm at a bit of a loss to explain
it, but at some point it would seem that htsearch is mangling the lower
case �. Do you have any documents containing a lower case � somewhere
in a word, and if so, does that word make it into db.wordlist correctly?
I still suspect a problem with ctype for your locale. Could you compile
and run the following C program on your system, and send me the output?
(Run it with the name of your locale, "sv", as an argument.)
Does using a locale of sv_SE (or even something else entirely like fr or
fr_FR) make any difference in your results? And for the long-shot question,
do are your documents use ISO 8859-1 (Latin 1) encoding, or are there some
that use a 7-bit encoding for Sweden?
-----------------------
#include <ctype.h>
#include <locale.h>
main(int ac, char **av)
{
int i;
unsigned char c;
if (ac > 1) setlocale(LC_ALL, av[1]);
for (i = 0; i < 256; ++i) {
printf("%3d 0x%02X: ", i, i);
c = i;
if (isprint(c))
printf(" %c", c);
else if (c < 0x80 && isprint(c ^ '@'))
printf("^%c", c ^ '@');
else if (isprint((c & 0x7F) ^ '@'))
printf("~%c", (c & 0x7F) ^ '@');
else
printf(" ");
printf(" %c%c%c%c%c%c%c%c%c%c%c%c%c\n",
isascii(c) ? 'A' : '-',
isalpha(c) ? 'a' : '-',
islower(c) ? 'l' : '-',
isupper(c) ? 'u' : '-',
isalnum(c) ? 'n' : '-',
isdigit(c) ? 'd' : '-',
isxdigit(c) ? 'x' : '-',
isgraph(c) ? 'g' : '-',
isprint(c) ? 't' : '-',
ispunct(c) ? 'p' : '-',
iscntrl(c) ? 'c' : '-',
isspace(c) ? 's' : '-',
#ifdef isblank
isblank(c) ? 'b' : '-'
#else
'?'
#endif
);
}
}
-----------------------
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You'll receive a message confirming the unsubscription.