Re: rm -Rfv on UNC path horribly slow in Cygwin 3.6.0-0.404.g323729f654ae.x86_64 ...

Roland Mainz via Cygwin Wed, 26 Feb 2025 20:59:39 -0800

On Wed, Feb 26, 2025 at 9:26 PM Corinna Vinschen via Cygwin
<cygwin@cygwin.com> wrote:
> On Feb 26 20:16, Roland Mainz via Cygwin wrote:
> > Something is wrong with UNC paths in Cygwin 3.6 build
> > 3.6.0-0.404.g323729f654ae.x86_64 and ms-nfs41-client 2025-02-21:
> >
> > Example:
> > ---- snip ----
> > $ cd //derfwnb4966_ipv4.global.loc@2049/nfs4/bigdisk/builds/bash_build1
> > $ rm -Rfv bash
> > removed 'bash/builtins/hash.o'
> > removed 'bash/builtins/fc.o'
> > removed 'bash/builtins/jobs.o'
> > removed 'bash/builtins/help.o'
> > removed 'bash/builtins/history.o'
> > removed 'bash/builtins/let.o'
> > removed 'bash/builtins/kill.o'
> > removed 'bash/builtins/mapfile.o'
> > removed 'bash/builtins/pushd.o'
> > removed 'bash/builtins/read.o'
> > removed 'bash/builtins/return.o'
> > removed 'bash/builtins/setattr.o'
> > ---- snip ----
> >
> > Each line from rm -Rfv now needs around 3-5 seconds. Switching to CWD
> > //derfwnb4966_ipv4@2049/nfs4/bigdisk/builds/bash_build1 (e.g. non-FQDN
> > for share host) does not fix the problem. Going to
> > /cygdrive/l/builds/bash_build1 and the rm -Rfv deletes around 70-90
> > files per second.
> > Last tested version was Cygwin commit
> > #4bcc6adec765ee8354bfc9e6c060c9d1c0c7bf46 (and same ms-nfs41-client
> > 2025-02-21 release), rm -Rfv performance there was normal (same as on
> > /cygdrive/l/...).
>
> I can't reproduce this on MS NFS and SMB.
>
> You should try this with the latest 406 build.


Done, works perfectly - see below, the issue was something different...

> If this doesn't fix
> it, I need to know which test release introduced this problem.

It turns out this is something very NFSv4.x-specific: Delegations!

A NFSv4.x server can grant a client a "read" or "write" delegation. A
client can cache reads or read/writes locally if the server grants
that delegation. If another client touches the file for which the
server has granted a delegation to another client, the delegation must
be recalled first and the caller gets a |NFS4ERR_DELAY|, and has to
re-try after a delay.

And this happens in this case, because the mounts I used for testing
are (intentionally [1]) treated as separate clients.

[1]=First it's my test setup, and secondly the ms-nfs41-client is
highly threaded to deal with the async nature of the Windows kernel
and NFSv4.x's nature of being able to handle requests in parallel, the
only limiting factor being TCP and the RPC implementations's locking.

Wireshark documented this nicely:
---- snip ----
No. Time Source Destination Protocol Length Info
1143 85.561066 10.49.202.232 10.49.202.230 NFS 318 V4 Call (Reply In
1145) remove REMOVE DH: 0x4b1e488f/caller.def
1144 85.561419 10.49.202.230 10.49.202.232 RPC 222 Continuation
1145 85.591706 10.49.202.230 10.49.202.232 NFS 174 V4 Reply (Call In
1143) remove REMOVE Status: NFS4ERR_DELAY
1146 85.615056 10.49.202.232 10.49.202.230 TCP 66 604 → 2049 [ACK]
Seq=6277 Ack=4741 Win=32763 Len=0 TSval=2579336 TSecr=3137316992
1147 85.646291 10.49.202.232 10.49.202.230 TCP 66 603 → 2049 [ACK]
Seq=22273 Ack=14733 Win=32768 Len=0 TSval=2579368 TSecr=3137317023
1149 86.099396 10.49.202.232 10.49.202.230 NFS 318 V4 Call remove
REMOVE DH: 0x4b1e488f/caller.def
1150 86.131692 10.49.202.230 10.49.202.232 NFS 174 V4 Reply (Call In
1149) remove REMOVE Status: NFS4ERR_DELAY
1151 86.177304 10.49.202.232 10.49.202.230 TCP 66 603 → 2049 [ACK]
Seq=22525 Ack=14841 Win=32767 Len=0 TSval=2579899 TSecr=3137317563
1158 87.146588 10.49.202.232 10.49.202.230 NFS 318 V4 Call (Reply In
1159) remove REMOVE DH: 0x4b1e488f/caller.def
1159 87.175843 10.49.202.230 10.49.202.232 NFS 174 V4 Reply (Call In
1158) remove REMOVE Status: NFS4ERR_DELAY
1160 87.224483 10.49.202.232 10.49.202.230 TCP 66 603 → 2049 [ACK]
Seq=22777 Ack=14949 Win=32767 Len=0 TSval=2580946 TSecr=3137318607
1170 88.677605 10.49.202.232 10.49.202.230 NFS 318 V4 Call (Reply In
1171) remove REMOVE DH: 0x4b1e488f/caller.def
1171 88.707859 10.49.202.230 10.49.202.232 NFS 174 V4 Reply (Call In
1170) remove REMOVE Status: NFS4ERR_DELAY
1172 88.755661 10.49.202.232 10.49.202.230 TCP 66 603 → 2049 [ACK]
Seq=23029 Ack=15057 Win=32766 Len=0 TSval=2582477 TSecr=3137320139
1179 89.271735 10.49.202.232 10.49.202.230 RPC 174 Continuation
1180 89.272124 10.49.202.232 10.49.202.230 NFS 306 V4 Call (Reply In
1182) delegreturn DELEGRETURN StateID: 0xa200
1181 89.272396 10.49.202.230 10.49.202.232 TCP 66 2049 → 604 [ACK]
Seq=4741 Ack=6625 Win=11624 Len=0 TSval=3137320703 TSecr=2582993
1182 89.272493 10.49.202.230 10.49.202.232 NFS 178 V4 Reply (Call In
1180) delegreturn DELEGRETURN
1183 89.318145 10.49.202.232 10.49.202.230 TCP 66 604 → 2049 [ACK]
Seq=6625 Ack=4853 Win=32763 Len=0 TSval=2583040 TSecr=3137320703
1204 90.708871 10.49.202.232 10.49.202.230 NFS 318 V4 Call remove
REMOVE DH: 0x4b1e488f/caller.def
1205 90.709587 10.49.202.230 10.49.202.232 NFS 314 V4 Reply (Call In
1204) remove REMOVE
1206 90.725892 10.49.202.232 10.49.202.230 NFS 314 V4 Call (Reply In
1208) remove REMOVE DH: 0x4b1e488f/cd.def
---- snip ----

So everything is working as expected, but I really need a FSCTL_* to
flush all delegations for accurate performance benchmarking.

Does the Win32 have a "flush cache"-|FSCTL_*|/|IOCTL_*| ?

----

Bye,
Roland
-- 
  __ .  . __
 (o.\ \/ /.o) roland.ma...@nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

Re: rm -Rfv on UNC path horribly slow in Cygwin 3.6.0-0.404.g323729f654ae.x86_64 ...

Reply via email to