On Sat, Nov 30, 2013 at 2:05 AM, Andriy Gapon <[email protected]> wrote:

> on 06/11/2013 21:26 Matthew Ahrens said the following:
> > On Tue, Oct 22, 2013 at 12:30 AM, Andriy Gapon <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     on 18/10/2013 20:43 Matthew Ahrens said the following:
> >     > It's shared among everyone who has a hold on the dbuf.  When there
> are no more
> >     > holds, the dbuf may be evicted, and call sa_evict() to get rid of
> the
> >     shared sa
> >     > handle.  Saving the sa_handle_t in the dbufs' user pointer allows
> e.g.
> >     > zfs_zget() to find it.  Although I guess it does use "immediate
> evict", so
> >     you'd
> >     > probably only find it if 2 threads are concurrently accessing the
> same object?
> >     >  I would presume that is possible.
> >
> >     So now I see it.  Essentially SA_HDL_SHARED allows to find sa from
> dbuf using
> >     dmu_buf_get_user.  But that does not imply sharing by itself.
> >
> >     Some observations:
> >     - immediate evict is used indeed
> >     - sa_evict consists only of a call to panic, so I guess that it is
> not expected
> >     to be evicted before sa_handle_destroy is called
> >     - all calls to sa_handle_get_from_db(SA_HDL_SHARED) happen within
> ZFS_OBJ_HOLD
> >
> >     I guess that the implication is that a znode normally[*] maintains a
> hold on the
> >     dbuf that backs the sa handle and thus there can be no eviction.
> >     [*] - this is not true only between zfs_suspend_fs and zfs_resume_fs.
> >
> >     > Have you tried adding an assertion (or dtracing) that there is no
> sharing
> >     (i.e.
> >     > that we never find an existing sa handle attached to the dbuf,
> from zfs_zget()
> >     > or sa_handle_get_from_db())?  You would probably need a workload
> that accesses
> >     > the same object concurrently, perhaps over NFS (so it has to use
> zfs_zget()
> >     > rather than having the vnode in hand).  That should show you if /
> how the
> >     > sharing code is exercised.
> >
> >     Yes, I have tried that and could not detect any sharing occurring in
> >     sa_handle_get_from_db.
> >
> >     Looking at zfs_zget specifically it does the following (many details
> omitted):
> >     ZFS_OBJ_HOLD_ENTER
> >     sa_buf_hold
> >     hdl = dmu_buf_get_user
> >     if hdl != NULL
> >       ...
> >     else
> >       ...
> >       sa_handle_get_from_db(SA_HDL_SHARED)
> >       ...
> >     ZFS_OBJ_HOLD_EXIT
> >
> >     So no sharing should happen in sa_handle_get_from_db, because of the
> locking and
> >     checking in zfs_zget.
> >
> >     To summarize.  A sa handle can not be shared between two different
> znodes.  So,
> >     sharing logic in sa_handle_get_from_db seems to be redundant given
> the logic of
> >     zfs_zget.  In more strong words, I believe that allowing sharing in
> >     sa_handle_get_from_db is a bug because it could mask other bugs.
> >
> >
> >
> > That makes sense, but I'd like to see an analysis of all code paths that
> can
> > call sa_handle_get_from_db() before we remove this.   E.g. zfs_mknode().
> >
> > It seems like the object lock (ZFS_OBJ_HOLD_ENTER()) would need to be
> held when
> > sa_handle_get_from_db(SA_HDL_SHARED) is called.  Have you tried adding an
> > assertion to that effect?
>
> I haven't added the assertion yet, but I see no problem with doing so, so
> I will
> do it.  Given that sa_handle_get_from_db(SA_HDL_SHARED) is called from
> just two
> places
>
> http://src.illumos.org/source/search?q=&defs=&refs=SA_HDL_SHARED&path=&hist=&project=illumos-gate
> it was rather trivial to do manual static analysis :-)
>
> zfs_znode_sa_init already has the assertion.
> In zfs_mknode an object is held quite early
>
> http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_znode.c#829
> and the hold is kept until the end of the function, so the
> sa_handle_get_from_db
> call on line 893 is definitely covered.
>
>
> > Could you explain the comment about the race condition in zfs_zget()?
>  Is it
> > accurate?  It seems like it is referring to the time between when
> zfs_mknode()
> > calls dmu_object_alloc() (or equivalent) and when it does
> ZFS_OBJ_HOLD_ENTER().
> > It appears that if zfs_zget() is called during that time, it will
> instantiate
> > the SA, and then zfs_mknode() share it.
>
> I think that this is the race that the comment refers to, but the handling
> of
> the race seems to be different from what you described:
>
>
> http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/zfs_znode.c#672
>
> It seems that zfs_znode_alloc would indeed instantiate the SA (via
> zfs_znode_sa_init), but then it would see uninitialized SA data and so it
> would
> destroy the SA handle and return NULL.  zfs_zget translates that NULL to
> ENOENT,
> which makes sense because the file is not really fully created yet.
> Then, after getting hold of the object, zfs_mknode would create a new
> handle and
> then actually populate the SA.
>
> So no sharing happens in this case either.
>
>
Ah.  And the OBJ_HOLD lock prevents zfs_mknode() from calling
zfs_znode_alloc() while zfs_zget() is calling it.  That makes sense.

--matt
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to