Re: Question about slow access to file information
Eliot Moss via Cygwin wrote: On 1/15/2023 3:38 AM, Christian Franke via Cygwin wrote: Eliot Moss via Cygwin wrote: I have a separate drive mounted this way: d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0 One thing I use it for is to store backup files. These tend to be 2 Gb chunks, and there can be hundreds of them in the backup directory. (The drive is 5Tb.) The Windows Disk Management tool describes it as NTFS, Basic Data Partition. Doing ls (for example) takes a very perceptible numbers of seconds (though whatever takes a long time seems to be cached, at least for a while, since a second ls soon after is fast). The problem is the 'noacl' mount option and the fact that POSIX only offers the *stat*() functions to retrieve file information. These functions always need to provide the full file information, even if only a small subset is needed. To determine the 'x'-permission bits in the 'stat.st_mode' field on a 'noacl'-mount, Cygwin reads the first bytes of most files (all except *.exe, *.lnk, *.com). The 'x' bits are set if the file starts with "#!" (script), ":\n" (?) or "MZ" (Windows executable). On 'noacl' mounts, this behavior could be suppressed by 'exec' or 'noexec' mount options. Interesting. I removed the noacl from /etc/fstab and restarted all Cygwin processes. The mount program now shows that drive without noacl. It still takes surprisingly long to ls if I have not done so recently. The directory contains ~1200 files. This depends on storage device, sometimes (HDD) on filesystem fragmentation and always on 'ls' options. Plain '/bin/ls' without any arguments does not call stat(). 'ls -s' or 'ls --color=yes' call stat() for each file. 'ls -l' additionally calls getfacl() for each file if on an 'acl' mount. The latter is apparently slower than expected, see below. Here a quick test on a directory with 1 ~3KB files on a NTFS USB drive connected via USB-2 (~28MB/s raw read speed). The first test of each mount variant was done immediately after connecting the drive: $ TIMEFORMAT='%R' 1. mount [-o acl] $ time ls -l > /dev/null 4.282 $ time ls -l > /dev/null 1.322 $ time ls -s > /dev/null 0.404 $ time ls > /dev/null 0.032 2. mount -o noacl $ time ls -l > /dev/null 13.452 $ time ls -l > /dev/null 0.789 $ time ls -s > /dev/null 0.764 $ time ls > /dev/null 0.033 3. mount -o noacl,noexec $ time ls -l > /dev/null 3.215 $ time ls -l > /dev/null 0.368 $ time ls -s > /dev/null 0.355 $ time ls > /dev/null 0.032 -- Regards, Christian -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Question about slow access to file information
On Sun, Jan 15, 2023 at 12:05:10PM +1100, Eliot Moss via Cygwin wrote: > On 1/15/2023 3:38 AM, Christian Franke via Cygwin wrote: > > Eliot Moss via Cygwin wrote: > > > I have a separate drive mounted this way: > > > > > > d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0 > > > > > > One thing I use it for is to store backup files. These tend to be 2 Gb > > > chunks, and there can be hundreds of them in the backup directory. (The > > > drive > > > is 5Tb.) The Windows Disk Management tool describes it as NTFS, Basic > > > Data > > > Partition. > > > > > > Doing ls (for example) takes a very perceptible numbers of seconds (though > > > whatever takes a long time seems to be cached, at least for a while, > > > since a > > > second ls soon after is fast). > > > > The problem is the 'noacl' mount option and the fact that POSIX only > > offers the *stat*() functions to retrieve file information. These > > functions always need to provide the full file information, even if only > > a small subset is needed. > > > > To determine the 'x'-permission bits in the 'stat.st_mode' field on a > > 'noacl'-mount, Cygwin reads the first bytes of most files (all except > > *.exe, *.lnk, *.com). The 'x' bits are set if the file starts with "#!" > > (script), ":\n" (?) or "MZ" (Windows executable). > > > > On 'noacl' mounts, this behavior could be suppressed by 'exec' or 'noexec' > > mount options. > > Interesting. I removed the noacl from /etc/fstab and restarted all Cygwin > processes. > The mount program now shows that drive without noacl. It still takes > surprisingly > long to ls if I have not done so recently. The directory contains ~1200 > files. > > Further thoughts? Does this make any difference? $ env - LANG=C ls -f /cygdrive/d/ Also, ISTR prior mailing list postings on how cygwin may open() each file to determine some info, and that can be expensive. Is that what is happening if you trace the 'ls'? Cheers, Glenn -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Question about slow access to file information
On 1/15/2023 3:38 AM, Christian Franke via Cygwin wrote: Eliot Moss via Cygwin wrote: I have a separate drive mounted this way: d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0 One thing I use it for is to store backup files. These tend to be 2 Gb chunks, and there can be hundreds of them in the backup directory. (The drive is 5Tb.) The Windows Disk Management tool describes it as NTFS, Basic Data Partition. Doing ls (for example) takes a very perceptible numbers of seconds (though whatever takes a long time seems to be cached, at least for a while, since a second ls soon after is fast). The problem is the 'noacl' mount option and the fact that POSIX only offers the *stat*() functions to retrieve file information. These functions always need to provide the full file information, even if only a small subset is needed. To determine the 'x'-permission bits in the 'stat.st_mode' field on a 'noacl'-mount, Cygwin reads the first bytes of most files (all except *.exe, *.lnk, *.com). The 'x' bits are set if the file starts with "#!" (script), ":\n" (?) or "MZ" (Windows executable). On 'noacl' mounts, this behavior could be suppressed by 'exec' or 'noexec' mount options. Interesting. I removed the noacl from /etc/fstab and restarted all Cygwin processes. The mount program now shows that drive without noacl. It still takes surprisingly long to ls if I have not done so recently. The directory contains ~1200 files. Further thoughts? EM -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Question about slow access to file information
Eliot Moss via Cygwin wrote: I have a separate drive mounted this way: d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0 One thing I use it for is to store backup files. These tend to be 2 Gb chunks, and there can be hundreds of them in the backup directory. (The drive is 5Tb.) The Windows Disk Management tool describes it as NTFS, Basic Data Partition. Doing ls (for example) takes a very perceptible numbers of seconds (though whatever takes a long time seems to be cached, at least for a while, since a second ls soon after is fast). The problem is the 'noacl' mount option and the fact that POSIX only offers the *stat*() functions to retrieve file information. These functions always need to provide the full file information, even if only a small subset is needed. To determine the 'x'-permission bits in the 'stat.st_mode' field on a 'noacl'-mount, Cygwin reads the first bytes of most files (all except *.exe, *.lnk, *.com). The 'x' bits are set if the file starts with "#!" (script), ":\n" (?) or "MZ" (Windows executable). On 'noacl' mounts, this behavior could be suppressed by 'exec' or 'noexec' mount options. -- Regards, Christian -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
Re: Question about slow access to file information
On Sat, Jan 14, 2023 at 11:42:58AM +1100, Eliot Moss via Cygwin wrote: > Dear Cygwin'ers - > > I have a separate drive mounted this way: > > d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0 > > One thing I use it for is to store backup files. These tend to be 2 Gb > chunks, and there can be hundreds of them in the backup directory. (The drive > is 5Tb.) The Windows Disk Management tool describes it as NTFS, Basic Data > Partition. > > Doing ls (for example) takes a very perceptible numbers of seconds (though > whatever takes a long time seems to be cached, at least for a while, since a > second ls soon after is fast). > > Windows Explorer (for example) and CMD do not seem to suffer this delay. > > Any notion as to what is happening and what I might do to ameliorate it? > > If it matters, the drive is removable (an external WD MyPassport hard drive). I *suspect* this will be an issue with `ls` querying some file metadata that are relatively slow to get out of an NTFS system, to provide a similar interface to native *nix systems, where Windows' tools unsurprisigly care more about the sorts of file properties that Windows filesystems are better optimised for. Based on experience, you might find using `ls --color=never` to be quicker: querying some of the properties that `ls` likes to use for colouring the output seems to require a bunch of extra queries to the filesystem. Failing that, if you have control over the directory layout, making the structure deeper with fewer objects in each directory will probably help. -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation:https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple