OK, so I'm using 3.17-rc3, same test on a flash usb drive, no
autodefrag. The situation is even stranger. The rsync is clearly
stuck, it's trying to write the same file for much more than 120 secs.
However dmesg is clean, no "INFO: task kworker/u16:11:1763 blocked for
more than 120 seconds" or anything.
df is responsive but shows no increase in used space.
Consider that with autodefrag this bug is completely "reliable", the
hung-task info starts to show up almost immediately.

Oh wait (I'm live...) now rsync is unstuck, files are being written
and df shows an increase in used space. BUT, still no hung-task
message in the kernel log, even though rsync was actually stuck for
several minutes.

So, to summarize, same conditions except no autodefrag. Result:
process stuck for way more than 120 secs but this time no complaints
in the kernel log.

Thanks
John



On Tue, Sep 2, 2014 at 10:23 PM, john terragon <jterra...@gmail.com> wrote:
> I don't know what to tell you about the ENOSPC code being heavily
> involved. At this point I'm using this simple test to see if things
> improve:
>
> -freshly created btrfs on dmcrypt,
> -rsync some stuff (since the fs is empty I could just use cp but I
> keep the test the same as it was when I had the problem for the first
> time)
> -note: the rsynced stuff is about the size of the volume but with
> compression I always end up with 1/2 to 3/4 free space
>
> I'm not sure how do I even get close to involving the ENOSPC code but
> probably I'm not fully aware of the inner workings of btrfs.
>
>> Can you try flipping off autodefrag?
>
> As soon as the damn unkillable rsync decides to obey the kill -9...
>
> Thanks
>
> John
>
> On Tue, Sep 2, 2014 at 10:10 PM, Chris Mason <c...@fb.com> wrote:
>>> On 09/02/2014 03:56 PM, john terragon wrote:
>>> Nice...now I get the hung task even with 3.14.17.... And I tried with
>>> 4K for node and leaf size...same result. And to top it all off, today
>>> I've been bitten by the bug also on my main root fs (which is on two
>>> fast ssd), although with 3.16.1.
>>>
>>> Is it at least safe for the data? I mean, as long as the hung process
>>> terminates and no other error shows up, can I at least be sure that
>>> the data written is correct?
>>
>> Your traces are a little different.  The ENOSPC code is throttling
>> things to make sure you have enough room for the writes you're doing.
>> The code we have in 3.17-rc3 (or my for-linus branch) are the best
>> choices right now.  You can pull that down to 3.16 if you want all the
>> fixes on a more stable kernel.
>>
>> Nailing down the ENOSPC code is going to be a little different, I think
>> autodefrag probably isn't interacting well with being short on space and
>> encryption.  This is leading to much more IO than we'd normally do, and
>> dm-crypt makes it fairly intensive.
>>
>> Can you try flipping off autodefrag?
>>
>> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to