Hi Greg, Greg Kurz wrote on Mon, Jul 04, 2016 at 05:08:49PM +0200: > On Mon, 4 Jul 2016 16:16:55 +0200 > Dominique Martinet <dominique.marti...@cea.fr> wrote: > > > I *think* this introduces a race somewhere, I'm getting errors like: > > cat: f.05: No such file or directory > > cat: f.14: No such file or directory > > cat: f.13: No such file or directory > > cat: f.39: No such file or directory > > cat: f.05: No such file or directory > > > > > > when doing: > > for file in {01..50}; do touch f.${file}; done > > seq 1 1000 | xargs -n 1 -P 25 -I{} cat f.* > /dev/null
Ok so, tested with the first two patches and I can't seem to hit any problem with the qemu server at least (I'd need more time to fix ganesha's 9p tcp/rdma server before I could blame the client in any way) The last patch looks good to me, I think it only makes an existing race more visible... What I think could happen is: process 1 has file open process 2 tries to open file, sees fid open process 1 closes file/clunk fids process 2 tries to clone now-clunked fid and gets ENOENT I'm afraid I just found out my hypervisor is no longer recent enough for gdb kernel scripts (gdb 7.6 and python 2.7.5 in el7 compared to the apparently required 7.7 and 2.7.6 respectively...), and I don't see anything obvious with just debug messages/adding a few printks (wasn't able to confirm where exactly that ENOENT comes from or if my theory is even close to the truth) I'd like to spend more time on it but don't think I'll be able to for a couple of weeks ; sorry about that. Were you able to reproduce the problem? Thanks, -- Dominique