On Thu, Jun 29, 2017 at 03:49:05PM -0400, Jeff Mahoney wrote: > On 6/29/17 3:21 PM, Omar Sandoval wrote: > > On Thu, Jun 22, 2017 at 09:51:47AM -0400, [email protected] wrote: > >> From: Jeff Mahoney <[email protected]> > >> > >> In a heavy write scenario, we can end up with a large number of pinned > >> bytes. > >> This can translate into (very) premature ENOSPC because pinned bytes > >> must be accounted for when allowing a reservation but aren't accounted for > >> when deciding whether to create a new chunk. > >> > >> This patch adds the accounting to should_alloc_chunk so that we can > >> create the chunk. > > > > Hey, Jeff, > > Hi Omar - > > > Does this fix your ENOSPC problem on a fresh filesystem? I just tracked > > No, it didn't. It helped somewhat, but we were still hitting it > frequently. What did help was reverting "Btrfs: skip commit transaction > if we don't have enough pinned bytes" (not upstream yet, on the list).
I see, that makes sense. > > down an ENOSPC issue someone here reported when doing a btrfs send to a > > fresh filesystem and it sounds a lot like your issue: metadata > > bytes_may_use shoots up but we don't allocate any chunks for it. I'm not > > seeing how including bytes_pinned will help for this case. We won't have > > any pinned bytes when populating a new fs, right? > > Our test environment is just installing the OS. That means lots of > creates, writes, and then renames, so there's a fair amount of metadata > churn that results in elevated pinned_bytes. Rsync can cause the same > workload pretty easily too. Nikolay was going to look into coming up > with a configuration for fsstress that would emulate it. The reproducer I have is a ~1.7GB btrfs receive onto a brand new 3GB filesystem. In my case, nothing (or very little) was getting pinned, but it makes sense that it's different for your case. > > I don't have a good solution. Allocating chunks based on bytes_may_use > > is going to way over-allocate because of our worst-case estimations. I'm > > double-checking now that the flusher is doing the right thing and not > > missing anything. I'll keep digging, just wanted to know if you had any > > thoughts. > > My suspicion is that it all just happens to work and that there are > several bugs working together that approximate a correct result. It certainly feels that way :) > My > reasoning is that the patch I referenced above is correct. The logic in > may_commit_transaction is inverted and causing a ton of additional > transaction commits. I think that having the additional transaction > commits is serving to free pinned bytes more quickly so things just work > for the most part and pinned bytes doesn't play as much of a role. But > once the transaction count comes down, that pinned bytes count gets > elevated and becomes more important. I think it should be taken into > account to determine whether committing a transaction early will result > in releasing enough space to honor the reservation without allocating a > new chunk. If the answer is yes, flush it. If no, there's no point in > flushing it now, so just allocate the chunk and move on. > > The big question is where this 80% number comes into play. > > There is a caveat here: almost all of our testing has been on 4.4 with a > bunch of these patches backported. I believe the same issue will still > be there on mainline, but we're in release crunch mode and I haven't had > a chance to test more fully. What's weird is that my reproducer hits this very frequently (>50% of the time) on our internal kernel build, which is 4.6 + backports up to 4.12, but upstream 4.12-rc7 hits it much less frequently (~5% of the time). Anyways, this is all getting messy in my head, so I'm just going to go head down on this for a little while and see what I can come up with. Thanks for the reply! > -Jeff > > >> Signed-off-by: Jeff Mahoney <[email protected]> > >> --- > >> fs/btrfs/extent-tree.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > >> index 33d979e9ea2a..88b04742beea 100644 > >> --- a/fs/btrfs/extent-tree.c > >> +++ b/fs/btrfs/extent-tree.c > >> @@ -4377,7 +4377,7 @@ static int should_alloc_chunk(struct btrfs_fs_info > >> *fs_info, > >> { > >> struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv; > >> u64 num_bytes = sinfo->total_bytes - sinfo->bytes_readonly; > >> - u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved; > >> + u64 num_allocated = sinfo->bytes_used + sinfo->bytes_reserved + > >> sinfo->bytes_pinned; > >> u64 thresh; > >> > >> if (force == CHUNK_ALLOC_FORCE) > >> -- > >> 2.11.0 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >> the body of a message to [email protected] > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > Jeff Mahoney > SUSE Labs > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
