Title: RE: Roundtripping in Unicode
Marcin 'Qrczak' Kowalczyk wrote:
You are trying to stick with processing byte sequences, carefully
preserving the storage format instead of preserving the meaning in
terms of Unicode characters. This leads to less robust software
which is not certain
Title: RE: Nicest UTF
Marcin 'Qrczak' Kowalczyk wrote:
My my, you are assuming all files are in the same encoding.
Yes. Otherwise nothing shows filenames correctly to the user.
UNIX is a multi user system. One user can use one locale and might never see files from another user that uses
Title: RE: Nicest UTF
D. Starner wrote:
Lars Kristan writes:
A system administrator (because he has access to all files).
My my, you are assuming all files are in the same encoding.
And what about
all the references to the files in scripts? In
configuration files? Soft
links?
Lars Kristan [EMAIL PROTECTED] writes:
But, as I once already said, you can do it with UTF-8, you simply
keep the invalid sequences as they are, and really handle them
differently only when you actually process them or display them.
UTF-8 is painful to process in the first place. You are
Title: RE: Roundtripping in Unicode
Philippe Verdy wrote:
An implementation that uses UTF-8 for valid string could use
the invalid
ranges for lead bytes to encapsultate invalid byte values.
Note however that
invalid bytes you would need to represent have 256 possible
values, but
Title: RE: RE: Roundtripping in Unicode
Philippe VERDY wrote:
If a source sequence is invalid, and you want to preserve it,
then this sequence must remain invalid if you change its encoding.
So there's no need for Unicode to assign valid code points
for invalid source data.
Using
Mark Davis schrieb:
This is just a confusion among the hoi polloi.
And here we have yet another example: hoi is Greek for the
(hoi polloi = the many).
Best wishes,
Otto Stolz
Lars Kristan [EMAIL PROTECTED] writes:
And once we understand that things are manageable and not as
frigtening as it seems at first, then we can stop using this as an
argument against introducing 128 codepoints. People who will find
them useful should and will bother with the consequences.
Thanks. I have gotten several messages from people who didn't get the joke;
that the sentence itself was an example of just the sorts of redundancy
being discussed.
Mark
- Original Message -
From: Otto Stolz [EMAIL PROTECTED]
To: Mark Davis [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent:
On Dec 10, 2004, at 1:25 PM, Tim Greenwood wrote:
Is that like the 'Please RSVP' that I see all too often? Or should
that not be excused?
Or -- my own personal favorite -- in the year AD 2004.
Some won't convert any and will just start using UTF-8
for new ones. And this should be allowed.
Why should it be allowed? You can't mix items with
different unlabeled encodings willy-nilly. All you're going
to get, all you can expect to get is a mess.
--
On 13/12/2004 14:57, Peter R. Mueller-Roemer wrote:
...
Did you make a formal proposal for NBSP, as Elaine did for Samaritan ?
The use of NBSP to carry combining marks is nothing to do with me. It
has been in Unicode from the start, as an alternative to SPACE. Not long
ago several problems were
Lars Kristan wrote: What I was talking about in the paragraph in question is what happens if you want to take unassigned codepoints and give them a new status.
You don't need to do that. No Unicode application must assign semantics to unassigned codepoints.
If a source sequence is invalid, and you
Title:
-Original Message- From:
[EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
On Behalf Of Peter R. Mueller-Roemer Sent: Monday, December 13,
2004 4:58 PM To: Peter Kirk Cc: Unicode Mailing List
Subject: Re: Thanks: auto loading Hebrew and Russian fonts ; Re: Unicode
HTML,
Ken is absolutely right. It would be theoretically possible to add 128 code
points that would allow one to roundtrip a bytestream after passing through
a UTF-8 = UTF-32 conversion. (For that matter, it would be possible to add
2048 code points that would allow the same for a 16-bit data stream.)
-Original Message-
Date/Time:Sun Dec 12 18:57:05 CST 2004
Contact: [EMAIL PROTECTED]
Report Type: Other Question, Problem, or Feedback
Opt Subject: Displaying Chinese characters and Chu Nom characters
Dear Unicode,
I am using Windows Xp pro, and have the Chinese simplified
Title: RE: RE: RE: Roundtripping in Unicode
Philippe VERDY wrote:
I don't think I miss the point. My suggested approach to
perform roundtrip conversions between UTF's while keeping all
invalid sequences as invalid (for the standard UTFs), is much
less risky than converting them to
Title: RE: Roundtripping in Unicode
Kenneth Whistler wrote:
Lars Kristan stated:
I said, the choice is yours. My proposal does not prevent
you from doing it
your way. You don't need to change anything and it will
still work the way
it worked before. OK? I just want 128 codepoints
Title: RE: Roundtripping in Unicode
Ken is absolutely right. It would be theoretically possible
to add 128 code
points that would allow one to roundtrip a bytestream after
passing through
a UTF-8 = UTF-32 conversion. (For that matter, it would be
possible to add
2048 code points
If I have understood this correctly, filenames are not in a locale, they
are absolute. Users, on the other hand, are in a locale, and users view
filenames. The same filename can look different to two different users. To
user A (whose locale is Latin-1), a filename might look valid; to user B
Title: Validity and properties of U+FFFD (was RE: Roundtripping in Unicode)
Doug Ewell wrote:
Philippe VERDY wrote:
(In fact I also think that mapping invalid sequences to
U+FFFD is also
an error, because U+FFFD is valid, and the presence of the encoding
error in the source is
Doug Ewell scripsit:
When faced with [an] ill-formed code unit sequence while transforming
or interpreting text, a conformant process must treat the first code
unit... as an illegally terminated code unit sequence -- for example, by
signaling an error, filtering the code unit out, or
Title: RE: Roundtripping in Unicode
Peter Kirk wrote:
Now no doubt many Unix filename handling utilities ignore the
fact that
some octets are invalid or uninterpretable in the locale,
because they
handle filenames as octet strings (with 0x00 and 0x2F having special
I like that. Makes total sense. Thanks.
Jill
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of Antoine Leca
Sent: 10 December 2004 17:38
To: Unicode
Subject: Re: When to validate?
As a result, your strings are likely to be some stuctures.
Then, it is pretty
Lars Kristan scripsit:
I'm using ISO-8859-2.
In fact you're lucky. Many ISO-8859-1 filenames display correctly in
ISO-8859-2. Not all users are so lucky.
It was a design point of ISO-8859-{1,2,3,4}, but not any other variants,
that every character appears either at the same codepoint or not
If I send the html-message e.g. to the IVRIT-group, who's members
work on all kinds of systems, I do not want them to have problems
reading and printing my mail. Any remedy on the horizon?
Use plain text e-mail, or HTML e-mail generated by your mail client
(including Mozilla, Outlook or
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of Peter R. Mueller-Roemer
Did you make a formal proposal for NBSP, as Elaine did for
Samaritan ?
I have studied unicode's uniqueness-rules, bidi-algorithm and many
others and feel almost
Title: RE: Roundtripping in Unicode
Marcin 'Qrczak' Kowalczyk wrote:
UTF-8 is painful to process in the first place. You are making it
even harder by demanding that all functions which process UTF-8 do
something sensible for bytes which don't form valid UTF-8. They even
can't temporarily
From : Lars Kristan
Philippe VERDY wrote:
If a source sequence is invalid, and you want to preserve it,
then this sequence must remain invalid if you change its encoding.
So there's no need for Unicode to assign valid code points
for invalid source data.
Using invalid UTF-16
Lars Kristan stated:
I said, the choice is yours. My proposal does not prevent you from doing it
your way. You don't need to change anything and it will still work the way
it worked before. OK? I just want 128 codepoints so I can make my own
choice.
You have them: U+EE80..U+EEFF, which are
From: D. Starner [EMAIL PROTECTED]
Some won't convert any and will just start using UTF-8
for new ones. And this should be allowed.
Why should it be allowed? You can't mix items with
different unlabeled encodings willy-nilly. All you're going
to get, all you can expect to get is a mess.
When you
Philippe VERDY wrote:
(In fact I also think that mapping invalid sequences to U+FFFD is also
an error, because U+FFFD is valid, and the presence of the encoding
error in the source is lost, and will not throw exceptions in further
processings of the remapped text, unless the application
Paul stabbedupp at yahoo dot com wrote:
,,the Unicode number assigned to this character is U+21a38, and when
inputting this number into the Unicode look-up chart, the box is
blank. Is there a way i can make characters up in this block of
Unicode display in my browser, and if so, what would i
Paul wrote:
,,the Unicode number assigned to this character is U+21a38, and when
inputting this number into the Unicode look-up chart, the box is
blank. Is there a way i can make characters up in this block of
Unicode display in my browser, and if so, what would i have to do?
34 matches
Mail list logo