On Wed, Jun 07, 2017 at 01:18:10PM -0700, Liu Bo wrote:
> On Tue, Jun 06, 2017 at 04:45:31PM -0700, Omar Sandoval wrote:
> > From: Omar Sandoval <osan...@fb.com>
> > 
> > The total_bytes_pinned counter is completely broken when accounting
> > delayed refs:
> > 
> > - If two drops for the same extent are merged, we will decrement
> >   total_bytes_pinned twice but only increment it once.
> > - If an add is merged into a drop or vice versa, we will decrement the
> >   total_bytes_pinned counter but never increment it.
> > - If multiple references to an extent are dropped, we will account it
> >   multiple times, potentially vastly over-estimating the number of bytes
> >   that will be freed by a commit and doing unnecessary work when we're
> >   close to ENOSPC.
> > 
> > The last issue is relatively minor, but the first two make the
> > total_bytes_pinned counter leak or underflow very often. These
> > accounting issues were introduced in b150a4f10d87 ("Btrfs: use a percpu
> > to keep track of possibly pinned bytes"), but they were papered over by
> > zeroing out the counter on every commit until d288db5dc011 ("Btrfs: fix
> > race of using total_bytes_pinned").
> > 
> > We need to make sure that an extent is accounted as pinned exactly once
> > if and only if we will drop references to it when when the transaction
> > is committed. Ideally we would only add to total_bytes_pinned when the
> > *last* reference is dropped, but this information isn't readily
> > available for data extents. Again, this over-estimation can lead to
> > extra commits when we're close to ENOSPC, but it's not as bad as before.
> > 
> > The fix implemented here is to increment total_bytes_pinned when the
> > total refmod count for an extent goes negative and decrement it if the
> > refmod count goes back to non-negative or after we've run all of the
> > delayed refs for that extent.
> >
> 
> The patch could be cleaner if we inc/dec %pinned inside delayed_ref.c.
> 
> The idea looks good to me.
> 
> Reviewed-by: Liu Bo <bo.li....@oracle.com>

Yeah, I think that'll work. My first reaction was that it'd be a
layering violation, but I think it makes sense, this counter really is
necessary because of delayed refs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to