Is there any crossplatform way to convert text from different codepages ?
>From my knowledge only ANSI CP1250 or ISO 8859 1 or maybe a subset of
them like the first 127 chars can be converted to UTF8 with AnsiToUTF8
and back.
I have searched the web for alternative solutions, one option would be
to use libiconv which is available as a dll for windows or directly on
nix or through libc on linux, another option would be to use the
conversion tables from http://www.unicode.org/Public/MAPPINGS/, i
looked in the libiconv sources and it uses a similar approach having
internal conversion tables to convert from Ansi codepages to UTF16,
which can be converted back to another Ansi codepage, during my
investigation i also found an interesting thing about the UCS-UTF,
UCS4 is now identical to UTF32 but at some point the UTF32 was a
subset of UCS4, however the UCS2 still is a subset of UTF16 and many
applications mistake the two, that's probably why so many platforms
use UTF16 not UTF8, at first windows (NT) also supported only the UCS2
subset not the full UTF16 so it gets tricky when they try to use
characters that are in UTF16 and have more than 2 bytes, for this
reason java apparently stores strings in utf16 for size considerations
but operate on UTF32 single characters, probably FPC should have a
similar strategy when operating on widestrings or UTF8 strings.
What approach do you think would be best ? IF FPC doesn't yet have any
support for converting from different charsets i was thinking to try
to implement my own AnsiToUTF with a parameter specifying the
codepage, using the freely available conversion tables, and in time
add more encodings, mainly this is useful for text files processing
and xml/html which is not necessarily UTF8 and may be windows-1252 or
iso-8859-1 or something else. This way FPC won't have an extra
dependency on libiconv.
Razvan
_________________________________________________________________
To unsubscribe: mail [EMAIL PROTECTED] with
"unsubscribe" as the Subject
archives at http://www.lazarus.freepascal.org/mailarchives