on 06/11/2013 21:26 Matthew Ahrens said the following: > On Tue, Oct 22, 2013 at 12:30 AM, Andriy Gapon <[email protected] > <mailto:[email protected]>> wrote: > > on 18/10/2013 20:43 Matthew Ahrens said the following: > > It's shared among everyone who has a hold on the dbuf. When there are > no more > > holds, the dbuf may be evicted, and call sa_evict() to get rid of the > shared sa > > handle. Saving the sa_handle_t in the dbufs' user pointer allows e.g. > > zfs_zget() to find it. Although I guess it does use "immediate evict", > so > you'd > > probably only find it if 2 threads are concurrently accessing the same > object? > > I would presume that is possible. > > So now I see it. Essentially SA_HDL_SHARED allows to find sa from dbuf > using > dmu_buf_get_user. But that does not imply sharing by itself. > > Some observations: > - immediate evict is used indeed > - sa_evict consists only of a call to panic, so I guess that it is not > expected > to be evicted before sa_handle_destroy is called > - all calls to sa_handle_get_from_db(SA_HDL_SHARED) happen within > ZFS_OBJ_HOLD > > I guess that the implication is that a znode normally[*] maintains a hold > on the > dbuf that backs the sa handle and thus there can be no eviction. > [*] - this is not true only between zfs_suspend_fs and zfs_resume_fs. > > > Have you tried adding an assertion (or dtracing) that there is no > sharing > (i.e. > > that we never find an existing sa handle attached to the dbuf, from > zfs_zget() > > or sa_handle_get_from_db())? You would probably need a workload that > accesses > > the same object concurrently, perhaps over NFS (so it has to use > zfs_zget() > > rather than having the vnode in hand). That should show you if / how > the > > sharing code is exercised. > > Yes, I have tried that and could not detect any sharing occurring in > sa_handle_get_from_db. > > Looking at zfs_zget specifically it does the following (many details > omitted): > ZFS_OBJ_HOLD_ENTER > sa_buf_hold > hdl = dmu_buf_get_user > if hdl != NULL > ... > else > ... > sa_handle_get_from_db(SA_HDL_SHARED) > ... > ZFS_OBJ_HOLD_EXIT > > So no sharing should happen in sa_handle_get_from_db, because of the > locking and > checking in zfs_zget. > > To summarize. A sa handle can not be shared between two different > znodes. So, > sharing logic in sa_handle_get_from_db seems to be redundant given the > logic of > zfs_zget. In more strong words, I believe that allowing sharing in > sa_handle_get_from_db is a bug because it could mask other bugs. > > > > That makes sense, but I'd like to see an analysis of all code paths that can > call sa_handle_get_from_db() before we remove this. E.g. zfs_mknode(). > > It seems like the object lock (ZFS_OBJ_HOLD_ENTER()) would need to be held > when > sa_handle_get_from_db(SA_HDL_SHARED) is called. Have you tried adding an > assertion to that effect?
I haven't added the assertion yet, but I see no problem with doing so, so I will do it. Given that sa_handle_get_from_db(SA_HDL_SHARED) is called from just two places http://src.illumos.org/source/search?q=&defs=&refs=SA_HDL_SHARED&path=&hist=&project=illumos-gate it was rather trivial to do manual static analysis :-) zfs_znode_sa_init already has the assertion. In zfs_mknode an object is held quite early http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_znode.c#829 and the hold is kept until the end of the function, so the sa_handle_get_from_db call on line 893 is definitely covered. > Could you explain the comment about the race condition in zfs_zget()? Is it > accurate? It seems like it is referring to the time between when zfs_mknode() > calls dmu_object_alloc() (or equivalent) and when it does > ZFS_OBJ_HOLD_ENTER(). > It appears that if zfs_zget() is called during that time, it will instantiate > the SA, and then zfs_mknode() share it. I think that this is the race that the comment refers to, but the handling of the race seems to be different from what you described: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_znode.c#672 It seems that zfs_znode_alloc would indeed instantiate the SA (via zfs_znode_sa_init), but then it would see uninitialized SA data and so it would destroy the SA handle and return NULL. zfs_zget translates that NULL to ENOENT, which makes sense because the file is not really fully created yet. Then, after getting hold of the object, zfs_mknode would create a new handle and then actually populate the SA. So no sharing happens in this case either. -- Andriy Gapon _______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
