On Sat, Nov 30, 2013 at 2:05 AM, Andriy Gapon <[email protected]> wrote:
> on 06/11/2013 21:26 Matthew Ahrens said the following: > > On Tue, Oct 22, 2013 at 12:30 AM, Andriy Gapon <[email protected] > > <mailto:[email protected]>> wrote: > > > > on 18/10/2013 20:43 Matthew Ahrens said the following: > > > It's shared among everyone who has a hold on the dbuf. When there > are no more > > > holds, the dbuf may be evicted, and call sa_evict() to get rid of > the > > shared sa > > > handle. Saving the sa_handle_t in the dbufs' user pointer allows > e.g. > > > zfs_zget() to find it. Although I guess it does use "immediate > evict", so > > you'd > > > probably only find it if 2 threads are concurrently accessing the > same object? > > > I would presume that is possible. > > > > So now I see it. Essentially SA_HDL_SHARED allows to find sa from > dbuf using > > dmu_buf_get_user. But that does not imply sharing by itself. > > > > Some observations: > > - immediate evict is used indeed > > - sa_evict consists only of a call to panic, so I guess that it is > not expected > > to be evicted before sa_handle_destroy is called > > - all calls to sa_handle_get_from_db(SA_HDL_SHARED) happen within > ZFS_OBJ_HOLD > > > > I guess that the implication is that a znode normally[*] maintains a > hold on the > > dbuf that backs the sa handle and thus there can be no eviction. > > [*] - this is not true only between zfs_suspend_fs and zfs_resume_fs. > > > > > Have you tried adding an assertion (or dtracing) that there is no > sharing > > (i.e. > > > that we never find an existing sa handle attached to the dbuf, > from zfs_zget() > > > or sa_handle_get_from_db())? You would probably need a workload > that accesses > > > the same object concurrently, perhaps over NFS (so it has to use > zfs_zget() > > > rather than having the vnode in hand). That should show you if / > how the > > > sharing code is exercised. > > > > Yes, I have tried that and could not detect any sharing occurring in > > sa_handle_get_from_db. > > > > Looking at zfs_zget specifically it does the following (many details > omitted): > > ZFS_OBJ_HOLD_ENTER > > sa_buf_hold > > hdl = dmu_buf_get_user > > if hdl != NULL > > ... > > else > > ... > > sa_handle_get_from_db(SA_HDL_SHARED) > > ... > > ZFS_OBJ_HOLD_EXIT > > > > So no sharing should happen in sa_handle_get_from_db, because of the > locking and > > checking in zfs_zget. > > > > To summarize. A sa handle can not be shared between two different > znodes. So, > > sharing logic in sa_handle_get_from_db seems to be redundant given > the logic of > > zfs_zget. In more strong words, I believe that allowing sharing in > > sa_handle_get_from_db is a bug because it could mask other bugs. > > > > > > > > That makes sense, but I'd like to see an analysis of all code paths that > can > > call sa_handle_get_from_db() before we remove this. E.g. zfs_mknode(). > > > > It seems like the object lock (ZFS_OBJ_HOLD_ENTER()) would need to be > held when > > sa_handle_get_from_db(SA_HDL_SHARED) is called. Have you tried adding an > > assertion to that effect? > > I haven't added the assertion yet, but I see no problem with doing so, so > I will > do it. Given that sa_handle_get_from_db(SA_HDL_SHARED) is called from > just two > places > > http://src.illumos.org/source/search?q=&defs=&refs=SA_HDL_SHARED&path=&hist=&project=illumos-gate > it was rather trivial to do manual static analysis :-) > > zfs_znode_sa_init already has the assertion. > In zfs_mknode an object is held quite early > > http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_znode.c#829 > and the hold is kept until the end of the function, so the > sa_handle_get_from_db > call on line 893 is definitely covered. > > > > Could you explain the comment about the race condition in zfs_zget()? > Is it > > accurate? It seems like it is referring to the time between when > zfs_mknode() > > calls dmu_object_alloc() (or equivalent) and when it does > ZFS_OBJ_HOLD_ENTER(). > > It appears that if zfs_zget() is called during that time, it will > instantiate > > the SA, and then zfs_mknode() share it. > > I think that this is the race that the comment refers to, but the handling > of > the race seems to be different from what you described: > > > http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_znode.c#672 > > It seems that zfs_znode_alloc would indeed instantiate the SA (via > zfs_znode_sa_init), but then it would see uninitialized SA data and so it > would > destroy the SA handle and return NULL. zfs_zget translates that NULL to > ENOENT, > which makes sense because the file is not really fully created yet. > Then, after getting hold of the object, zfs_mknode would create a new > handle and > then actually populate the SA. > > So no sharing happens in this case either. > > Ah. And the OBJ_HOLD lock prevents zfs_mknode() from calling zfs_znode_alloc() while zfs_zget() is calling it. That makes sense. --matt
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
