On Fri, Apr 11, 2008 at 4:26 AM, Ignacio Javier
>  What is really doing koha, when doing marc8 decoding/MARC21, is to store in
>  database a sequence of:
>  base character + unicode form of (tilde|cute|grave...etc)
>  ...that is:
>  a´, a`,n~, etc...
>  ...with ` or ´ or etc... in UTF-8 (using 3 native bytes instead of 2 native
>  bytes)
>  Instead of:
>  á, à, etc... (2 native bytes)
>  Internet Explorer, not surprisingly for me, renders a´ as á, etc... but no
>  other tools do it this way, for example firefox renders í as i with an upper
>  to the dot acute.

The UTF-8 is valid, it just may not be in the ideal normalization
form.  The strings that MARC::Charset produces when it converts from
MARC-8 are in a decomposed Unicode normalization form, either NFD or
NFKD.  Some web browsers can render NFD strings without any
difficulty, while other ones seem to work better if NFC is used.
Right now Koha passes UTF-8 strings to the browser without
renormalizing them, but perhaps we should be automatically converting
them to NFC?


Galen Charlton
Koha Application Developer
p: 1-888-564-2457 x709

Koha-devel mailing list

Reply via email to