Now I've also changed NFS_DIRBLKSIZ to 4k - no change.

harti

-----Original Message-----
From: Rick Macklem [mailto:rmack...@uoguelph.ca] 
Sent: Tuesday, May 14, 2013 2:50 PM
To: Brandt, Hartmut
Cc: curr...@freebsd.org
Subject: Re: files disappearing from ls on NFS

Hartmut Brandt wrote:
> On Mon, 13 May 2013, Rick Macklem wrote:
> 
> RM>Hartmut Brandt wrote:
> RM>> On Sun, 12 May 2013, Rick Macklem wrote:
> RM>>
> RM>> RM>Hartmut Brandt wrote:
> RM>> RM>> Hi,
> RM>> RM>>
> RM>> RM>> I've updated one of my -current machines this week (previous
> RM>> update
> RM>> RM>> was in
> RM>> RM>> february). Now I see a strange effect (it seems only on NFS
> RM>> mounts):
> RM>> RM>> ls or
> RM>> RM>> even echo * will list only some files (strange enough the
> first
> RM>> files
> RM>> RM>> from
> RM>> RM>> the normal, alphabetically ordered list). If I change
> something
> RM>> in the
> RM>> RM>> directory (delete a file or create a new one) for some time
> the
> RM>> RM>> complete
> RM>> RM>> listing will appear but after sime time (seconds to a minute
> or
> RM>> so)
> RM>> RM>> again
> RM>> RM>> only part of the files is listed.
> RM>> RM>>
> RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that
> getdirentries is
> RM>> RM>> called
> RM>> RM>> only once (returning 4096). For a full listing getdirentries
> is
> RM>> called
> RM>> RM>> 5
> RM>> RM>> times with the last returning 0.
> RM>> RM>>
> RM>> RM>> I can still open files that are not listed if I know their
> name,
> RM>> RM>> though.
> RM>> RM>>
> RM>> RM>> The NFS server is a Windows 2008 server with an OpenText NFS
> RM>> Server
> RM>> RM>> which
> RM>> RM>> works without problems to all the other FreeBSD machines.
> RM>> RM>>
> RM>> RM>> So what could that be?
> RM>> RM>>
> RM>> RM>I've attached a patch that might be worth trying. It is a
> "shot in
> RM>> the dark",
> RM>> RM>but brings the new NFS client's readdir closer to the old one
> RM>> (which you
> RM>> RM>mentioned still works ok).
> RM>> RM>
> RM>> RM>Please let me know how it goes, if you have a chance to test
> it,
> RM>> rick
> RM>>
> RM>> Hi Rick,
> RM>>
> RM>> the patch doesn't help.
> RM>>
> RM>> I wrote a small test program, which opens a directory, calls
> RM>> getdents(2)
> RM>> in a loop and dumps that. I figured out, that the return of the
> system
> RM>> call depends on the buffer size I pass to it. The directory has a 
> RM>> block size of 4k according to fstat(2). If I use that, I get some 
> RM>> 300
> of the
> RM>> almost 500 directory entries. If I use 8k, I get just around 200
> and
> RM>> if I
> RM>> use 16k I get a handfull. If I dump the buffer in this case I see
> RM>> 0x200
> RM>> bytes filled with directory entries, then a lot of zeros and
> starting
> RM>> from
> RM>> 0x1000 again data. This is of course ignored because of the zeros 
> RM>> before.
> RM>>
> RM>And for this case getdents(2) returned 16K? It is normal for
> getdents(2)
> RM>to return less than requested and when end of dir occurs, it should
> return 0.
> RM>
> RM>But if it returns 16K, there shouldn't be zeroed space in the
> middle of
> RM>it.
> RM>
> RM>And this always occurs or only after you wait a while? (You noted
> in the
> RM>above description that it would be ok for a little while after a
> directory
> RM>change and then would break, which suggests some kind of caching
> problem.)
> 
> Today in the morning everything was fine. After waiting 5 minutes, 
> again only partial directories. When I do a read with 8k buffer size,
> getdents(2) returns 8k, but starting from 0x200 until 0x1000 the 
> buffer is filled with zeros. The entry just before the zeroes ends 
> exactly at
> 0x200
> (that would be the first byte of the next entry) and at 0x1000 a new 
> entry starts. The rest of the buffer is fine. The next read returns 
> only 4k and seems to be fine - altough it contains some junk non-zero 
> bytes in the padding.
> 
Directory entries should never cross DIRBLKSIZ boundaries (512 or 0x200), so it 
makes sense that one ends at 0x200 and one starts at 0x1000. What doesn't make 
sense are the 0 bytes in between.

One difference between the old and new NFS clients, which the patch I sent you 
changed to the way the old one does it, is filling in the last block.
The old NFS client just leaves the block short and depends on n_direofoffset to 
recognize it is the last block with b_resid indicating where it ends.
For the new client (unless you've applied the patch I emailed you), it fills 
the rest of the last block in with "empty directories". This was in the OpenBSD 
code when I did the original NFSv4 stuff and port. I left it in, because I 
thought it might avoid problems if n_direofoffset was ever bogus. That is why 
there might be "different junk" at the end of the directory, but it shouldn't 
matter.

It almost sounds like something else is bzero()ing out part of the buffer cache 
block. Unless the directory has changed, the getdents() after 5 minutes would 
just return the same buffer cache block that was read in 5 minutes earlier 
(unless the buffer fell out of the cache and had to be re-read from the server, 
which would only happen if there was a lot of other file I/O going on during 
that 5minutes).

A couple of comments:
- You can run "nfsstat -m" as root to see what the mount it actually
  configured to use. This might be worth looking at, to see if any
  of the values are "weird".
- One other difference between the old and new NFS clients is the
  value of NFS_DIRBLKSIZ. For the new one, it is 8K instead of 4K.
  You could change this in fs/nfs/nfsport.h, where is is defined
  and then rebuild the sources to see if it has any effect. I can't
  see why it should matter, but??
- Maybe you could post your system configuration. Someone might spot
  something that changed in Feb.->Mar. related to your hardware/setup?

> Ten minutes later again everything is fine. I tries to spy at the NFS 
> communication with tcpdump, but it seems unwilling to display 
> something useful about the NFS. Is it able to decode the readdir 
> stuff?
> 
To look at NFS packets you need wireshark. You can capture the packets with 
tcpdump using the -w option. Something like:
# tcpdump -s 0 -w file.pcap host server
- Then look at file.pcap in wireshark. (Often more convenient than
  installing wireshark on a particular machine.) If you'd like, you can email 
me the file.pcap and I can look at it.

rick

> harti
> 
> _______________________________________________
> freebsd-current@freebsd.org mailing list 
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscr...@freebsd.org"
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to