on 06/11/2013 21:26 Matthew Ahrens said the following:
> On Tue, Oct 22, 2013 at 12:30 AM, Andriy Gapon <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     on 18/10/2013 20:43 Matthew Ahrens said the following:
>     > It's shared among everyone who has a hold on the dbuf.  When there are 
> no more
>     > holds, the dbuf may be evicted, and call sa_evict() to get rid of the
>     shared sa
>     > handle.  Saving the sa_handle_t in the dbufs' user pointer allows e.g.
>     > zfs_zget() to find it.  Although I guess it does use "immediate evict", 
> so
>     you'd
>     > probably only find it if 2 threads are concurrently accessing the same 
> object?
>     >  I would presume that is possible.
> 
>     So now I see it.  Essentially SA_HDL_SHARED allows to find sa from dbuf 
> using
>     dmu_buf_get_user.  But that does not imply sharing by itself.
> 
>     Some observations:
>     - immediate evict is used indeed
>     - sa_evict consists only of a call to panic, so I guess that it is not 
> expected
>     to be evicted before sa_handle_destroy is called
>     - all calls to sa_handle_get_from_db(SA_HDL_SHARED) happen within 
> ZFS_OBJ_HOLD
> 
>     I guess that the implication is that a znode normally[*] maintains a hold 
> on the
>     dbuf that backs the sa handle and thus there can be no eviction.
>     [*] - this is not true only between zfs_suspend_fs and zfs_resume_fs.
> 
>     > Have you tried adding an assertion (or dtracing) that there is no 
> sharing
>     (i.e.
>     > that we never find an existing sa handle attached to the dbuf, from 
> zfs_zget()
>     > or sa_handle_get_from_db())?  You would probably need a workload that 
> accesses
>     > the same object concurrently, perhaps over NFS (so it has to use 
> zfs_zget()
>     > rather than having the vnode in hand).  That should show you if / how 
> the
>     > sharing code is exercised.
> 
>     Yes, I have tried that and could not detect any sharing occurring in
>     sa_handle_get_from_db.
> 
>     Looking at zfs_zget specifically it does the following (many details 
> omitted):
>     ZFS_OBJ_HOLD_ENTER
>     sa_buf_hold
>     hdl = dmu_buf_get_user
>     if hdl != NULL
>       ...
>     else
>       ...
>       sa_handle_get_from_db(SA_HDL_SHARED)
>       ...
>     ZFS_OBJ_HOLD_EXIT
> 
>     So no sharing should happen in sa_handle_get_from_db, because of the 
> locking and
>     checking in zfs_zget.
> 
>     To summarize.  A sa handle can not be shared between two different 
> znodes.  So,
>     sharing logic in sa_handle_get_from_db seems to be redundant given the 
> logic of
>     zfs_zget.  In more strong words, I believe that allowing sharing in
>     sa_handle_get_from_db is a bug because it could mask other bugs.
> 
> 
> 
> That makes sense, but I'd like to see an analysis of all code paths that can
> call sa_handle_get_from_db() before we remove this.   E.g. zfs_mknode().
> 
> It seems like the object lock (ZFS_OBJ_HOLD_ENTER()) would need to be held 
> when
> sa_handle_get_from_db(SA_HDL_SHARED) is called.  Have you tried adding an
> assertion to that effect?

I haven't added the assertion yet, but I see no problem with doing so, so I will
do it.  Given that sa_handle_get_from_db(SA_HDL_SHARED) is called from just two
places
http://src.illumos.org/source/search?q=&defs=&refs=SA_HDL_SHARED&path=&hist=&project=illumos-gate
it was rather trivial to do manual static analysis :-)

zfs_znode_sa_init already has the assertion.
In zfs_mknode an object is held quite early
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_znode.c#829
and the hold is kept until the end of the function, so the sa_handle_get_from_db
call on line 893 is definitely covered.


> Could you explain the comment about the race condition in zfs_zget()?  Is it
> accurate?  It seems like it is referring to the time between when zfs_mknode()
> calls dmu_object_alloc() (or equivalent) and when it does 
> ZFS_OBJ_HOLD_ENTER().
> It appears that if zfs_zget() is called during that time, it will instantiate
> the SA, and then zfs_mknode() share it.

I think that this is the race that the comment refers to, but the handling of
the race seems to be different from what you described:

http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_znode.c#672

It seems that zfs_znode_alloc would indeed instantiate the SA (via
zfs_znode_sa_init), but then it would see uninitialized SA data and so it would
destroy the SA handle and return NULL.  zfs_zget translates that NULL to ENOENT,
which makes sense because the file is not really fully created yet.
Then, after getting hold of the object, zfs_mknode would create a new handle and
then actually populate the SA.

So no sharing happens in this case either.

-- 
Andriy Gapon
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to