I tried this with an UTF-8 encoded string (japanese):

"\u304A\u308C\u3068\u9B5A".unpack("U*").pack("C*")

Which gives me this in return:

"u304Au308Cu3068u9B5A"

And that's not what I want stored in my index, right?

Now I'm pretty sure I'm doing something dumb :-)  hopefully someone can clarify.

Thanks.

On 3/22/07, Thomas Senf <[EMAIL PROTECTED]> wrote:
> David Balmain wrote:
> >
> > Unfortunately Ferret doesn't come with UTF-8 support in Windows as the
> > win32 runtime environment doesn't seem to support UTF-8. You will
> > therefore need to write your own analyzer on Windows if you want to
> > support UTF-8 searches.
> >
>
> Hello Star Burger,
>
> if you're planning to write your own UTF-8 Analyzer consider the
> unpack/pack duo:
>
> utf-8_encoded_string_from_db.unpack("U*").pack("C*")
> @index << {:content => utf-8_encoded_string_from_db}
> @index.search_each('content:Behörde') {|id,score| do_sth}
>
> I didn't try this in afa, but with ruby it worked in my case.
>
>
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> [email protected]
> http://rubyforge.org/mailman/listinfo/ferret-talk


-- 
Julio C. Ody
http://rootshell.be/~julioody
_______________________________________________
Ferret-talk mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/ferret-talk

Reply via email to