Re: Question about slow access to file information

2023-01-17 Thread Christian Franke via Cygwin

Eliot Moss via Cygwin wrote:

On 1/15/2023 3:38 AM, Christian Franke via Cygwin wrote:

Eliot Moss via Cygwin wrote:

I have a separate drive mounted this way:

d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0

One thing I use it for is to store backup files.  These tend to be 2 Gb
chunks, and there can be hundreds of them in the backup directory. 
(The drive
is 5Tb.)  The Windows Disk Management tool describes it as NTFS, 
Basic Data

Partition.

Doing ls (for example) takes a very perceptible numbers of seconds 
(though
whatever takes a long time seems to be cached, at least for a while, 
since a

second ls soon after is fast).


The problem is the 'noacl' mount option and the fact that POSIX only 
offers the *stat*() functions to retrieve file information. These 
functions always need to provide the full file information, even if 
only a small subset is needed.


To determine the 'x'-permission bits in the 'stat.st_mode' field on a 
'noacl'-mount, Cygwin reads the first bytes of most files (all except 
*.exe, *.lnk, *.com). The 'x' bits are set if the file starts with 
"#!" (script), ":\n" (?) or "MZ" (Windows executable).


On 'noacl' mounts, this behavior could be suppressed by 'exec' or 
'noexec' mount options.


Interesting.  I removed the noacl from /etc/fstab and restarted all 
Cygwin processes.
The mount program now shows that drive without noacl.  It still takes 
surprisingly
long to ls if I have not done so recently.  The directory contains 
~1200 files.


This depends on storage device, sometimes (HDD) on filesystem 
fragmentation and always on 'ls' options. Plain '/bin/ls' without any 
arguments does not call stat(). 'ls -s' or 'ls --color=yes' call stat() 
for each file. 'ls -l' additionally calls getfacl() for each file if on 
an 'acl' mount. The latter is apparently slower than expected, see below.


Here a quick test on a directory with 1 ~3KB files on a NTFS USB 
drive connected via USB-2 (~28MB/s raw read speed). The first test of 
each mount variant was done immediately after connecting the drive:


$ TIMEFORMAT='%R'

1. mount [-o acl]

$ time ls -l > /dev/null
4.282
$ time ls -l > /dev/null
1.322
$ time ls -s > /dev/null
0.404
$ time ls > /dev/null
0.032


2. mount -o noacl

$ time ls -l > /dev/null
13.452
$ time ls -l > /dev/null
0.789
$ time ls -s > /dev/null
0.764
$ time ls > /dev/null
0.033


3. mount -o noacl,noexec

$ time ls -l > /dev/null
3.215
$ time ls -l > /dev/null
0.368
$ time ls -s > /dev/null
0.355
$ time ls > /dev/null
0.032

--
Regards,
Christian


--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Question about slow access to file information

2023-01-14 Thread gs-cygwin.com--- via Cygwin
On Sun, Jan 15, 2023 at 12:05:10PM +1100, Eliot Moss via Cygwin wrote:
> On 1/15/2023 3:38 AM, Christian Franke via Cygwin wrote:
> > Eliot Moss via Cygwin wrote:
> > > I have a separate drive mounted this way:
> > > 
> > > d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0
> > > 
> > > One thing I use it for is to store backup files.  These tend to be 2 Gb
> > > chunks, and there can be hundreds of them in the backup directory. (The 
> > > drive
> > > is 5Tb.)  The Windows Disk Management tool describes it as NTFS, Basic 
> > > Data
> > > Partition.
> > > 
> > > Doing ls (for example) takes a very perceptible numbers of seconds (though
> > > whatever takes a long time seems to be cached, at least for a while, 
> > > since a
> > > second ls soon after is fast).
> > 
> > The problem is the 'noacl' mount option and the fact that POSIX only
> > offers the *stat*() functions to retrieve file information. These
> > functions always need to provide the full file information, even if only
> > a small subset is needed.
> > 
> > To determine the 'x'-permission bits in the 'stat.st_mode' field on a
> > 'noacl'-mount, Cygwin reads the first bytes of most files (all except
> > *.exe, *.lnk, *.com). The 'x' bits are set if the file starts with "#!"
> > (script), ":\n" (?) or "MZ" (Windows executable).
> > 
> > On 'noacl' mounts, this behavior could be suppressed by 'exec' or 'noexec' 
> > mount options.
> 
> Interesting.  I removed the noacl from /etc/fstab and restarted all Cygwin 
> processes.
> The mount program now shows that drive without noacl.  It still takes 
> surprisingly
> long to ls if I have not done so recently.  The directory contains ~1200 
> files.
> 
> Further thoughts?

Does this make any difference?
$ env - LANG=C ls -f /cygdrive/d/

Also, ISTR prior mailing list postings on how cygwin may open() each
file to determine some info, and that can be expensive.  Is that what is
happening if you trace the 'ls'?

Cheers, Glenn

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Question about slow access to file information

2023-01-14 Thread Eliot Moss via Cygwin

On 1/15/2023 3:38 AM, Christian Franke via Cygwin wrote:

Eliot Moss via Cygwin wrote:

I have a separate drive mounted this way:

d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0

One thing I use it for is to store backup files.  These tend to be 2 Gb
chunks, and there can be hundreds of them in the backup directory. (The drive
is 5Tb.)  The Windows Disk Management tool describes it as NTFS, Basic Data
Partition.

Doing ls (for example) takes a very perceptible numbers of seconds (though
whatever takes a long time seems to be cached, at least for a while, since a
second ls soon after is fast).


The problem is the 'noacl' mount option and the fact that POSIX only offers the *stat*() functions 
to retrieve file information. These functions always need to provide the full file information, even 
if only a small subset is needed.


To determine the 'x'-permission bits in the 'stat.st_mode' field on a 'noacl'-mount, Cygwin reads 
the first bytes of most files (all except *.exe, *.lnk, *.com). The 'x' bits are set if the file 
starts with "#!" (script), ":\n" (?) or "MZ" (Windows executable).


On 'noacl' mounts, this behavior could be suppressed by 'exec' or 'noexec' 
mount options.


Interesting.  I removed the noacl from /etc/fstab and restarted all Cygwin 
processes.
The mount program now shows that drive without noacl.  It still takes 
surprisingly
long to ls if I have not done so recently.  The directory contains ~1200 files.

Further thoughts?

EM

--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Question about slow access to file information

2023-01-14 Thread Christian Franke via Cygwin

Eliot Moss via Cygwin wrote:

I have a separate drive mounted this way:

d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0

One thing I use it for is to store backup files.  These tend to be 2 Gb
chunks, and there can be hundreds of them in the backup directory.  
(The drive
is 5Tb.)  The Windows Disk Management tool describes it as NTFS, Basic 
Data

Partition.

Doing ls (for example) takes a very perceptible numbers of seconds 
(though
whatever takes a long time seems to be cached, at least for a while, 
since a

second ls soon after is fast).


The problem is the 'noacl' mount option and the fact that POSIX only 
offers the *stat*() functions to retrieve file information. These 
functions always need to provide the full file information, even if only 
a small subset is needed.


To determine the 'x'-permission bits in the 'stat.st_mode' field on a 
'noacl'-mount, Cygwin reads the first bytes of most files (all except 
*.exe, *.lnk, *.com). The 'x' bits are set if the file starts with "#!" 
(script), ":\n" (?) or "MZ" (Windows executable).


On 'noacl' mounts, this behavior could be suppressed by 'exec' or 
'noexec' mount options.


--
Regards,
Christian


--
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple


Re: Question about slow access to file information

2023-01-14 Thread Adam Dinwoodie via Cygwin
On Sat, Jan 14, 2023 at 11:42:58AM +1100, Eliot Moss via Cygwin wrote:
> Dear Cygwin'ers -
> 
> I have a separate drive mounted this way:
> 
> d:/ /cygdrive/d ntfs binary,posix=0,user,noacl,auto 0 0
> 
> One thing I use it for is to store backup files.  These tend to be 2 Gb
> chunks, and there can be hundreds of them in the backup directory.  (The drive
> is 5Tb.)  The Windows Disk Management tool describes it as NTFS, Basic Data
> Partition.
> 
> Doing ls (for example) takes a very perceptible numbers of seconds (though
> whatever takes a long time seems to be cached, at least for a while, since a
> second ls soon after is fast).
> 
> Windows Explorer (for example) and CMD do not seem to suffer this delay.
> 
> Any notion as to what is happening and what I might do to ameliorate it?
> 
> If it matters, the drive is removable (an external WD MyPassport hard drive).

I *suspect* this will be an issue with `ls` querying some file
metadata that are relatively slow to get out of an NTFS system, to
provide a similar interface to native *nix systems, where Windows' tools
unsurprisigly care more about the sorts of file properties that Windows
filesystems are better optimised for.

Based on experience, you might find using `ls --color=never` to be
quicker: querying some of the properties that `ls` likes to use for
colouring the output seems to require a bunch of extra queries to the
filesystem.  Failing that, if you have control over the directory
layout, making the structure deeper with fewer objects in each directory
will probably help.

-- 
Problem reports:  https://cygwin.com/problems.html
FAQ:  https://cygwin.com/faq/
Documentation:https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple