On Wed, Sep 23, 2009 at 5:30 PM, Ross Smith wrote:
Corinna Vinschen wrote:
However, if we default to UTF-8 for a subset of languages anyway, it
gets even more interesting to ask, why not for all languages? Isn't it
better in the long run to have the same default for all Cygwin
2009/9/22 Corinna Vinschen:
Therefore, when converting a UTF-16 Windows filename to the current
charset, 0xDC?? words should be treated like any other UTF-16 word
that can't be represented in the current charset: it should be encoded
as a ^N sequence.
(I started writing this before seeing
On Sep 22 19:07, Corinna Vinschen wrote:
On Sep 22 17:12, Andy Koppe wrote:
True, but that's an implementation issue rather than a design issue,
i.e. the ^N conversion needs to do the UTF-8 conversion itself rather
than invoke the __utf8 functions. Shall I look into creating a patch?
[...]
2009/9/23 Corinna Vinschen:
I have a local patch ready to use the ANSI codepage by default in the
C locale. It appears to work nicely and has the additional positive
side effect to simplify the code in a few places.
If I only new that eastern language users could happily live with
this
On Sep 23 13:34, Andy Koppe wrote:
2009/9/23 Corinna Vinschen:
I have a local patch ready to use the ANSI codepage by default in the
C locale. It appears to work nicely and has the additional positive
side effect to simplify the code in a few places.
If I only new that eastern language
On Sep 23 14:43, Corinna Vinschen wrote:
On Sep 23 13:34, Andy Koppe wrote:
2009/9/23 Corinna Vinschen:
I have a local patch ready to use the ANSI codepage by default in the
C locale. It appears to work nicely and has the additional positive
side effect to simplify the code in a few
Corinna Vinschen wrote:
However, if we default to UTF-8 for a subset of languages anyway, it
gets even more interesting to ask, why not for all languages? Isn't it
better in the long run to have the same default for all Cygwin
installations?
I'm really wondering if we shouldn't simply default
On Sep 21 19:54, Andy Koppe wrote:
2009/9/21 Corinna Vinschen:
As you might know, invalid bytes = 0x80 are translated to UTF-16 by
transposing them into the 0xdc00 - 0xdcff range by just or'ing 0xdc00.
The problem now is that readdir() will return the transposed characters
as if they are
2009/9/22 Corinna Vinschen:
As you might know, invalid bytes = 0x80 are translated to UTF-16 by
transposing them into the 0xdc00 - 0xdcff range by just or'ing 0xdc00.
The problem now is that readdir() will return the transposed characters
as if they are the original characters.
Yep,
On Sep 22 17:12, Andy Koppe wrote:
2009/9/22 Corinna Vinschen:
Therefore, when converting a UTF-16 Windows filename to the current
charset, 0xDC?? words should be treated like any other UTF-16 word
that can't be represented in the current charset: it should be encoded
as a ^N sequence.
On Sep 16 00:38, Lapo Luchini wrote:
Andy Koppe wrote:
Hmm, we've lost the \xDF somewhere, and I'd guess it was when the
filename got translated to UTF-16 in fopen(), which would explain what
you're seeing
More data: it's not simply the last character, is something more
complex than
2009/9/21 Corinna Vinschen:
% cat t.c
int main() {
fopen(a-\xF6\xE4\xFC\xDF, w); //ISO-8859-1
fopen(b-\xF6\xE4\xFC\xDFz, w);
fopen(c-\xF6\xE4\xFC\xDFzz, w);
fopen(d-\xF6\xE4\xFC\xDFzzz, w);
fopen(e-\xF6\xE4\xFC\xDF\xF6\xE4\xFC\xDF, w);
return 0;
}
Ok, I see what
Andy Koppe wrote:
Hmm, we've lost the \xDF somewhere, and I'd guess it was when the
filename got translated to UTF-16 in fopen(), which would explain what
you're seeing
More data: it's not simply the last character, is something more
complex than that.
% cat t.c
int main() {
2009/9/10 Lapo Luchini:
But the real problem with that test is not really what shows and how,
the biggest problem is that it seems that filenames created with a
wrong filename are quite limited in usage and can't seemingly be deleted.
% export LANG=en_EN.UTF-8
% cat t.c
#include stdio.h
14 matches
Mail list logo