On 12/19/18 9:06 AM, Ming Lei wrote:
> On Wed, Dec 19, 2018 at 08:50:09AM -0700, Jens Axboe wrote:
>> DM currently has a statically allocated bio that it uses to issue empty
>> flushes. It doesn't submit this bio, it just uses it for maintaining
>> state while setting up clones. Multiple users can access this bio at the
>> same time. This wasn't previously an issue, even if it was a bit iffy,
>> but with the blkg associations it can become one.
>>
>> We setup the blkg association, then clone bio's and submit, then remove
>> the blkg assocation again. But since we can have multiple tasks doing
>> this at the same time, against multiple blkg's, then we can either lose
>> references to a blkg, or put it twice. The latter causes complaints on
>> the percpu ref being <= 0 when released, and can cause use-after-free as
>> well. Ming reports that xfstest generic/475 triggers this:
>>
>> ------------[ cut here ]------------
>> percpu ref (blkg_release) <= 0 (0) after switching to atomic
>> WARNING: CPU: 13 PID: 0 at lib/percpu-refcount.c:155
>> percpu_ref_switch_to_atomic_rcu+0x2c9/0x4a0
>>
>> Switch to just using an on-stack bio for this, and get rid of the
>> embedded bio.
>>
>> Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device")
>> Reported-by: Ming Lei <[email protected]>
>> Signed-off-by: Jens Axboe <[email protected]>
>>
>> ---
>>
>> Only other use case like this I could find is raid5-cache, which
>> serializes access to its embedded bio (and actually submits it, too).
>> That one looks fine.
>
> With this patch, xfstests(generic/475) won't trigger the warning of
> 'percpu ref (blkg_release) <= 0 (0) after switching to atomic' any more.
>
> Tested-by: Ming Lei <[email protected]>
Great, thanks Ming. I think this was somewhat of a ticking time bomb,
so good to have it fixed even if we could have done without a bug. And
thanks for all the testing!
--
Jens Axboe