cp1252.txt and can hang

Albrecht Schlosser Sun, 05 Dec 2010 14:23:47 -0800

DO NOT REPLY TO THIS MESSAGE.  INSTEAD, POST ANY RESPONSES TO THE LINK BELOW.


[STR New]

Link: http://www.fltk.org/str.php?L2348
Version: 1.3-current


Without commenting Manolo's proposal (looks interesting)...

I remembered that I had read something about encoding unknown characters
in the private use area of Unicode (U+E000 - U+F8FF), see chapter 16.5 of
the Unicode standard
<http://www.unicode.org/versions/Unicode5.2.0/ch16.pdf>.

The idea: pick a contiguous range of 128 codepoints from the private use
area (e.g. U+F880 - U+F88F), and encode each illegal _byte_ (range
0x80-0xff) as one Unicode codepoint U+F800 + <value of byte>. Example:
Euro (â¬, 0x80 in CP1252) would be encoded as U+F880). We do only deal
with single illegal bytes. These will be converted to legal UTF-8
encodings in the private use area. When saving the file, this process can
be inverted. Maybe we would have to take special care of 0xfe and 0xff
that would map to "non-character" Unicode codepoints, but we could also
use 2 more allowed codepoints.

Of course, these characters can't be displayed (they will probably be
rendered as illegal characters in all fonts), but they can be identified
and re-converted to the original encoding of the file. All internal UTF-8
functions _should_ be able to deal with them.

Maybe a combination of some of the recent ideas/proposals with this
encoding of illegal characters could make it.


Link: http://www.fltk.org/str.php?L2348
Version: 1.3-current

_______________________________________________
fltk-bugs mailing list
[email protected]
http://lists.easysw.com/mailman/listinfo/fltk-bugs

Re: [fltk.bugs] [HIGH] STR #2348: test/editor fails to display misc/cp1252.txt and can hang

Reply via email to