Christian Franke wrote:
Corinna Vinschen via Cygwin wrote:
On Jun 27 15:32, Christian Franke via Cygwin wrote:
$ touch $'t-\xef\x80\x80'
The name mapping is:
"t-\xEF\x80\x80" -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-"
Did you copy/paste this from the old mail, by any chance?
Sorry, I accidentally mixed two cases with same readdir() result:
"t-\xEF\x80\x80" -(open, ...)-> L"t-\xF000" -(readdir)-> "t-"
"t-\xED\xAD\x99' -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-"
$ touch $'t-\xed\xad\x99'
$ touch $'t-\xef\x80\x80'
$ ls | uniq -c
2 t-
Does no longer occur in 3.7.0-0.165.g1b60f4861b70 but see below.
...
...
I'll apply the patch shortly.
$ touch $'t-\xed\xad\x90'
$ touch $'t-\xed\xad\x91'
$ touch $'t-\xed\xad\x92'
$ touch $'t-\xed\xad\x93'
$ touch $'t-\xed\xad\x94'
$ ls | uniq -c
5 t-
$ ls -s
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
total 0
? t- ? t- ? t- ? t- ? t-
All results found by several runs with different seeds of the attached
test program have in common that the Windows path name contains an
invalid word in UTF-16 High Surrogate range:
$ ./randnames 42
$'t-\xEC\x9E\xB3\xEF\x82\x80\xEF\x83\xA0': access() failed, errno=2:
$'t-\xED\xA4\xA8\x80\xE0': original path
L"t-\xD928\xF080\xF0E0": Windows path
$'t-\xEE\x9E\xB3\xEF\x83\xA1': access() failed, errno=2:
$'t-\xED\xA6\xB0\xE1': original path
L"t-\xD9B0\xF0E1": Windows path
...
$'t-\xE7\xBE\xB3\xEF\x82\xB3': access() failed, errno=2:
$'t-\xED\xA2\x96\xB3': original path
L"t-\xD896\xF0B3": Windows path
A closer look reveals two problems:
1.) A lone high surrogate is not encoded correctly. Could be fixed with
this patch:
https://cygwin.com/pipermail/cygwin-patches/2025q2/014001.html
2.) A high surrogate at the very end of the string is not encoded at
all. A fix would require to enhance the interface between __*_wctomb()
and the outer functions. The outer loop would need to call the function
again after L'\0' occurred.
BTW: if the file name consists only of a single high surrogate, an
interesting corner case of readdir() is visible:
$ echo foo >$'\uD876' # Windows name: L"\xD876"
$ cat $'\uD876'
foo
$ ls
$ ls -a | uniq -c
1 .
2 ..
--
Regards,
Christian
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple