Rich Felker wrote:
On Thu, Mar 01, 2007 at 09:41:44AM +0100, Marcel Ruff wrote:
Are you thinking of Java's _modified_ version of UTF-8
(http://en.wikipedia.org/wiki/UTF-8#Java)?
Uhg, disgusting...
Yes - this is an open & serious issue for my approach!
Has anybody some practical advice on this?
Just treat the sequence c0 80 according to the spec, as an invalid
sequence. Neither it (because it's illegal utf-8) nor a real NUL
(because it's illegal in text) should appear. If your problem is more
specific and there's a real reason you need to handle such data
differently, please describe what you're doing so we can offer better
advice.
The first sentence from the above wiki says:
"In normal usage, the Java programming language
<http://en.wikipedia.org/wiki/Java_%28programming_language%29> supports
standard UTF-8 when reading and writing strings through
|InputStreamReader
<http://java.sun.com/javase/6/docs/api/java/io/InputStreamReader.html>|
and |OutputStreamWriter
<http://java.sun.com/javase/6/docs/api/java/io/OutputStreamWriter.html>"|
and this is what i do to access sockets, so no problems here.
But then it states that 'Supplementary multilingual plane' is encoded
incompatible.
So must i assume if i send 'mathematical alphanumeric symbols'
http://en.wikipedia.org/wiki/Mathematical_alphanumeric_symbols
like 'ℝ' from C to java they will be corrupted?
Both applications work with what they think is 'UTF-8' ...
Marcel
Rich
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/