Re: [Ferret-talk] Accented characters

Jens Kraemer Wed, 23 May 2007 04:38:14 -0700

On Wed, May 23, 2007 at 12:43:21PM +0200, Marcello parra wrote:
> > In the log, I get:
> > 
> > creating doc for class: Conta, id: 164
> > Adding field name with value 'JosÃ© Antonio' to index
> 
> 
> I included a word prejuízo... that should be translated to prejuizo...
> I put some code to output information when it builds the index. This is 
> what a get:
> 
> Analyzing: field:nome  str:prejuÃzo
> token["preju":0:5:1]
> token["zo":7:9:1]


With the script at http://pastie.caboo.se/63808 I get:

token["prejuizo":0:9:1]

It seems that Ferret doesn't recognize the í as a character and
therefore splits the word at this position.

You have to make sure that everything in your environment is using UTF-8
as character encoding for these things to work (expecially locale
settings are relevant to ferret)

Jens

-- 
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49 351 46766-0 | Telefax +49 351 46766-66
[EMAIL PROTECTED] | www.webit.de
 
Amtsgericht Dresden | HRB 15422
GF Sven Haubold, Hagen Malessa
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Re: [Ferret-talk] Accented characters

Reply via email to