Title: Re: [aseek-users] Problems with charsets/unicode
The reason is that charset is not set for page: http://melior.univ-montp3.fr/ and ASPseek indexer treats word P�rez as two words: and rez.
I don't know which value of iso88591 do you use in CharsetTableU1 directive, but if you use value iso-8859-1 then this value must be first in the CharsetAlias directive.
If cs parameter is set to iso88591 which is different from specified in CharsetTableU1, then searcher also treats word P�rez as two words and finds the page.
If cs parameter is set to the value specified in CharsetTableU1 then searcher can't find word P�rez because it is not in the index due to absent charset.
 
Alexander.
 
----- Original Message -----
Sent: Tuesday, March 27, 2001 4:48 PM
Subject: Re: [aseek-users] Problems with charsets/unicode

Yesterday I found my bug which can lead to improper indexing in unicode
version
Replace line with "return" statement in the method <bool
CUWord::operator==(const CUWord& Word) const> to
  return (*w == 0) && (*w1 == 0);
(file: ucharset.h)
and reindex everything.

I've done that and I have the same problem:

http://melior dot univ-montp3 dot fr/aspseek/s.cgi?cs=iso88591&q=lambert+claviers+p%E9rez&ps=20&dt=back&dp=0

or

http://melior dot univ-montp3 dot fr/aspseek/s.cgi?q=lambert+claviers+p%E9rez&ps=20&dt=back&dp=0

gives:

1.      Bienvenue sur Melior [0.50220]

  ...Une journ�e sur XML et les documents �lectroniques s`est tenue le 19 mai 1999 � Lyon. Rapport par Gilles P�rez-Lambert Le livre d`or � vos claviers ! http:// Melior ......
http://melior.univ-montp3.fr/ 1 ko
Version index�e

and:

http://melior dot univ-montp3 dot fr/aspseek/s.cgi?cs=iso-8859-1&q=lambert+claviers+p%E9rez&ps=20&dt=back&dp=0

gives no result (Sorry, we didn't find...).


Gilles.

Reply via email to