Thanks you, really great how fast you adapt the source/make patches for this. 
Saw so many posts were people did not get NFS41 working with ESXi and FreeBSD 
and now I have it already running with your changes.

I have now compiled the kernel with all 4 patches, and it works now.

Some problems are still left:

- the "Server returned improper reason for no delegation: 2" warnings are still 
in the vmkernel.log.
                2018-03-08T11:41:20.290Z cpu0:68011 opID=488969b0)WARNING: 
NFS41: NFS41ValidateDelegation:608: Server returned improper reason for no 
delegation: 2

- can't delete a folder with the VMware host client datastore browser:
                2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.349Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.350Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.350Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.350Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.351Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.352Z cpu1:67981 opID=f5159ce3)WARNING: 
NFS41: NFS41FileOpReaddir:4728: Failed to process READDIR result for fh 
0x43046e4cb158: Transient file system condition, suggest retry
                2018-03-08T11:34:00.352Z cpu1:67981 opID=f5159ce3)WARNING: 
UserFile: 2155: hostd-worker: Directory changing too often to perform readdir 
operation (11 retries), returning busy

- after a reboot of the FreeBSD machine the ESXi does not restore the NFS 
datastore again with following warning (just disconnecting the links is fine)
                2018-03-08T12:39:44.602Z cpu23:66484)WARNING: NFS41: 
NFS41_Bug:2361: BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP

Actually I have only made some quick benchmarks with ATTO in a Windows VM which 
has a vmdk on the NFS41 datastore which is mounted over two 1GB links in 
different subnets.
Read is nearly the double of just a single connection and write is just a bit 
faster. Don't know if write speed could be improved, actually the share is UFS 
on a HW raid controller which has local write speeds about 500MB/s.

At following link is the vmkernel.log from mouning the NFS share, attaching a 
vmdk from the share to a Win VM, running ATTO benchmark on it, 
disconnecting/reconnecting network and also the problem with the 
BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP after reboot.
Till the reboot I have also made a trace on one of the two links. 
(nfs41_trace_before_reboot.pcap and nfs41_trace_after_reboot.pcap)

https://files.fm/u/wvybmdmc

andi

-----Original Message-----
From: Rick Macklem [mailto:rmack...@uoguelph.ca] 
Sent: Donnerstag, 8. März 2018 03:48
To: NAGY Andreas <andreas.n...@frequentis.com>; 'freebsd-stable@freebsd.org' 
<freebsd-stable@freebsd.org>
Subject: Re: NFS 4.1 RECLAIM_COMPLETE FS failed error in combination with ESXi 
client

NAGY Andreas wrote:
>attached the trace. If I see it correct it uses FORE_OR_BOTH. 
>(bctsa_dir: >CDFC4_FORE_OR_BOTH (0x00000003))
Yes. The scary part is the ExchangeID before the BindConnectiontoSession.
(Normally that is only done at the beginning of a new mount to get a ClientID,  
followed immediately by a CreateSession. I don't know why it would do this?)

The attached patch might get BindConnectiontoSession to work. I have no way to 
test it beyond seeing it compile. Hopefully it will apply cleanly.

>The trace is only with the first patch, have not compiled the wantdeleg 
>patches so >far.
That's fine. I don't think that matters much.

>I think this is related to the BIND_CONN_TO_SESSION; after a disconnect the 
>ESXi >cannot connect to the NFS also with this warning:
>2018-03-07T16:55:11.227Z cpu21:66484)WARNING: NFS41: NFS41_Bug:2361: 
>>BUG - Invalid BIND_CONN_TO_SESSION error: NFS4ERR_NOTSUPP
If the attached patch works, you'll find out what it fixes.

>Another thing I noticed today is that it is not possible to delete a folder 
>with the >ESXi datastorebrowser on the NFS mount. Maybe it is a VMWare bug, 
>but with >NFS3 it works.
>
>Here the vmkernel.log with only one connection contains mounting, trying to 
>>delete a folder and disconnect:
>
>2018-03-07T16:46:04.543Z cpu12:68008 opID=55bea165)World: 12235: VC 
>opID >c55dbe59 maps to vmkernel opID 55bea165 2018-03-07T16:46:04.543Z 
>cpu12:68008 opID=55bea165)NFS41: >NFS41_VSIMountSet:423: Mount server: 
>10.0.0.225, port: 2049, path: /, label: >nfsds1, security: 1 user: , 
>options: <none> 2018-03-07T16:46:04.543Z cpu12:68008 
>opID=55bea165)StorageApdHandler: >977: APD Handle  Created with 
>lock[StorageApd-0x43046e4c6d70] 2018-03-07T16:46:04.544Z 
>cpu11:66486)NFS41: >NFS41ProcessClusterProbeResult:3873: Reclaiming 
>state, cluster 0x43046e4c7ee0 >[7] 2018-03-07T16:46:04.545Z cpu12:68008 
>opID=55bea165)NFS41: >NFS41FSCompleteMount:3791: Lease time: 120 
>2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: 
>>NFS41FSCompleteMount:3792: Max read xfer size: 0x20000 
>2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: 
>>NFS41FSCompleteMount:3793: Max write xfer size: 0x20000 
>2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: 
>>NFS41FSCompleteMount:3794: Max file size: 0x800000000000 
>2018-03-07T16:46:04.545Z cpu12:68008 opID=55bea165)NFS41: 
>>NFS41FSCompleteMount:3795: Max file name: 255 2018-03-07T16:46:04.545Z 
>cpu12:68008 opID=55bea165)WARNING: NFS41: >NFS41FSCompleteMount:3800: 
>The max file name size (255) of file system is >larger than that of FSS 
>(128) 2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: 
>>NFS41FSAPDNotify:5960: Restored connection to the server 10.0.0.225 
>mount >point nfsds1, mounted as 1a7893c8-eec764a7-0000-000000000000 
>("/") 2018-03-07T16:46:04.546Z cpu12:68008 opID=55bea165)NFS41: 
>>NFS41_VSIMountSet:435: nfsds1 mounted successfully 
>2018-03-07T16:47:19.869Z cpu21:67981 opID=e47706ec)World: 12235: VC 
>opID >c55dbe91 maps to vmkernel opID e47706ec 2018-03-07T16:47:19.869Z 
>cpu21:67981 opID=e47706ec)WARNING: NFS41: >NFS41FileOpReaddir:4728: 
>Failed to process READDIR result for fh 0x43046e4c6
I have no idea if getting BindConnectiontoSession working will fix this or not?

rick

_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to