[Koha-bugs] [Bug 14759] Replacement for Text::Unaccent

bugzilla-daemon Tue, 08 Dec 2015 14:12:08 -0800

http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14759


--- Comment #15 from David Cook <[email protected]> ---
(In reply to Galen Charlton from comment #14)
> Other way around: Text::Unaccent is not, as it would be much preferable,
> emitting Perl Unicode strings; rather, it is emitting octet-sequences.

Sorry, I must have been unclear; I meant to say that Text::Unaccent is emitting
octet-sequences (hence why using encode() on the string returned by
Text::Unaccent would create a Perl Unicode string).

And that Perl itself was causing problems when it tried to create a new string
from an octet sequence string and a Perl Unicode string.

> A good pattern is aim for is using *only* Unicode strings within core code,
> and relegating use of Encode and friends to input and output; Text::Unaccent
> would get in the way of that.

Fair enough. I'm not in favour of Text::Unaccent per se. I was curious why it
seemed to mangle some strings, and I shared what answers I found. 

I suspect Unicode::Normalize will really be the way to go, as you suggest. It
seems much more comprehensive than Text::Unaccent and Text::Unaccent::PurePerl.
I imagine we just need feedback from people experienced in Arabic, Hebrew, and
CJK languages.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 14759] Replacement for Text::Unaccent

Reply via email to