Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-25 Thread Robert Pendell
On Wed, Sep 23, 2009 at 5:30 PM, Ross Smith wrote: Corinna Vinschen wrote: However, if we default to UTF-8 for a subset of languages anyway, it gets even more interesting to ask, why not for all languages?  Isn't it better in the long run to have the same default for all Cygwin

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-23 Thread Andy Koppe
2009/9/22 Corinna Vinschen: Therefore, when converting a UTF-16 Windows filename to the current charset, 0xDC?? words should be treated like any other UTF-16 word that can't be represented in the current charset: it should be encoded as a ^N sequence. (I started writing this before seeing

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-23 Thread Corinna Vinschen
On Sep 22 19:07, Corinna Vinschen wrote: On Sep 22 17:12, Andy Koppe wrote: True, but that's an implementation issue rather than a design issue, i.e. the ^N conversion needs to do the UTF-8 conversion itself rather than invoke the __utf8 functions. Shall I look into creating a patch? [...]

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-23 Thread Andy Koppe
2009/9/23 Corinna Vinschen: I have a local patch ready to use the ANSI codepage by default in the C locale.  It appears to work nicely and has the additional positive side effect to simplify the code in a few places. If I only new that eastern language users could happily live with this

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-23 Thread Corinna Vinschen
On Sep 23 13:34, Andy Koppe wrote: 2009/9/23 Corinna Vinschen: I have a local patch ready to use the ANSI codepage by default in the C locale.  It appears to work nicely and has the additional positive side effect to simplify the code in a few places. If I only new that eastern language

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-23 Thread Corinna Vinschen
On Sep 23 14:43, Corinna Vinschen wrote: On Sep 23 13:34, Andy Koppe wrote: 2009/9/23 Corinna Vinschen: I have a local patch ready to use the ANSI codepage by default in the C locale.  It appears to work nicely and has the additional positive side effect to simplify the code in a few

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-23 Thread Ross Smith
Corinna Vinschen wrote: However, if we default to UTF-8 for a subset of languages anyway, it gets even more interesting to ask, why not for all languages? Isn't it better in the long run to have the same default for all Cygwin installations? I'm really wondering if we shouldn't simply default

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-22 Thread Corinna Vinschen
On Sep 21 19:54, Andy Koppe wrote: 2009/9/21 Corinna Vinschen: As you might know, invalid bytes = 0x80 are translated to UTF-16 by transposing them into the 0xdc00 - 0xdcff range by just or'ing 0xdc00. The problem now is that readdir() will return the transposed characters as if they are

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-22 Thread Andy Koppe
2009/9/22 Corinna Vinschen: As you might know, invalid bytes = 0x80 are translated to UTF-16 by transposing them into the 0xdc00 - 0xdcff range by just or'ing 0xdc00. The problem now is that readdir() will return the transposed characters as if they are the original characters. Yep,

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-22 Thread Corinna Vinschen
On Sep 22 17:12, Andy Koppe wrote: 2009/9/22 Corinna Vinschen: Therefore, when converting a UTF-16 Windows filename to the current charset, 0xDC?? words should be treated like any other UTF-16 word that can't be represented in the current charset: it should be encoded as a ^N sequence.

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-21 Thread Corinna Vinschen
On Sep 16 00:38, Lapo Luchini wrote: Andy Koppe wrote: Hmm, we've lost the \xDF somewhere, and I'd guess it was when the filename got translated to UTF-16 in fopen(), which would explain what you're seeing More data: it's not simply the last character, is something more complex than

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-21 Thread Andy Koppe
2009/9/21 Corinna Vinschen: % cat t.c int main() {     fopen(a-\xF6\xE4\xFC\xDF, w); //ISO-8859-1     fopen(b-\xF6\xE4\xFC\xDFz, w);     fopen(c-\xF6\xE4\xFC\xDFzz, w);     fopen(d-\xF6\xE4\xFC\xDFzzz, w);     fopen(e-\xF6\xE4\xFC\xDF\xF6\xE4\xFC\xDF, w);     return 0; } Ok, I see what

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-15 Thread Lapo Luchini
Andy Koppe wrote: Hmm, we've lost the \xDF somewhere, and I'd guess it was when the filename got translated to UTF-16 in fopen(), which would explain what you're seeing More data: it's not simply the last character, is something more complex than that. % cat t.c int main() {

Re: [1.7] Invalid UTF8 while creating a file - cannot delete?

2009-09-10 Thread Andy Koppe
2009/9/10 Lapo Luchini: But the real problem with that test is not really what shows and how, the biggest problem is that it seems that filenames created with a wrong filename are quite limited in usage and can't seemingly be deleted. % export LANG=en_EN.UTF-8 % cat t.c #include stdio.h