Christian Franke wrote:
Corinna Vinschen via Cygwin wrote:
On Jun 27 15:32, Christian Franke via Cygwin wrote:
$ touch $'t-\xef\x80\x80'
The name mapping is:
"t-\xEF\x80\x80" -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-"
Did you copy/paste this from the old mail, by any chance?

Sorry, I accidentally mixed two cases with same readdir() result:

"t-\xEF\x80\x80" -(open, ...)-> L"t-\xF000" -(readdir)-> "t-"
"t-\xED\xAD\x99' -(open, ...)-> L"t-\xDB59" -(readdir)-> "t-"

$ touch $'t-\xed\xad\x99'
$ touch $'t-\xef\x80\x80'
$ ls | uniq -c
      2 t-

Does no longer occur in 3.7.0-0.165.g1b60f4861b70 but see below.
...
...
I'll apply the patch shortly.

$ touch $'t-\xed\xad\x90'
$ touch $'t-\xed\xad\x91'
$ touch $'t-\xed\xad\x92'
$ touch $'t-\xed\xad\x93'
$ touch $'t-\xed\xad\x94'
$ ls | uniq -c
      5 t-

$ ls -s
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
ls: cannot access 't-': No such file or directory
total 0
? t-  ? t-  ? t-  ? t-  ? t-

All results found by several runs with different seeds of the attached test program have in common that the Windows path name contains an invalid word in UTF-16 High Surrogate range:

$ ./randnames 42
$'t-\xEC\x9E\xB3\xEF\x82\x80\xEF\x83\xA0': access() failed, errno=2:
$'t-\xED\xA4\xA8\x80\xE0': original path
L"t-\xD928\xF080\xF0E0": Windows path

$'t-\xEE\x9E\xB3\xEF\x83\xA1': access() failed, errno=2:
$'t-\xED\xA6\xB0\xE1': original path
L"t-\xD9B0\xF0E1": Windows path
...
$'t-\xE7\xBE\xB3\xEF\x82\xB3': access() failed, errno=2:
$'t-\xED\xA2\x96\xB3': original path
L"t-\xD896\xF0B3": Windows path


A closer look reveals two problems:

1.) A lone high surrogate is not encoded correctly. Could be fixed with this patch:
https://cygwin.com/pipermail/cygwin-patches/2025q2/014001.html

2.) A high surrogate at the very end of the string is not encoded at all. A fix would require to enhance the interface between __*_wctomb() and the outer functions. The outer loop would need to call the function again after L'\0' occurred.

BTW: if the file name consists only of a single high surrogate, an interesting corner case of readdir() is visible:

$ echo foo >$'\uD876' # Windows name: L"\xD876"
$ cat $'\uD876'
foo
$ ls
$ ls -a | uniq -c
      1 .
      2 ..

--
Regards,
Christian


--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Reply via email to