Hi Tejun,
Thanks very much for reviewing this patch.
On 18/2/8 05:38, Tejun Heo wrote:
> Hello, Joseph.
>
> On Wed, Feb 07, 2018 at 04:40:02PM +0800, Joseph Qi wrote:
>> writeback kworker
>> blkcg_bio_issue_check
>> rcu_read_lock
>> blkg_lookup
>> <<< *race window*
>> blk_throtl_bio
>> spin_lock_irq(q->queue_lock)
>> spin_unlock_irq(q->queue_lock)
>> rcu_read_unlock
>>
>> cgroup_rmdir
>> cgroup_destroy_locked
>> kill_css
>> css_killed_ref_fn
>> css_killed_work_fn
>> offline_css
>> blkcg_css_offline
>> spin_trylock(q->queue_lock)
>> blkg_destroy
>> spin_unlock(q->queue_lock)
>
> Ah, right. Thanks for spotting the bug.
>
>> Since rcu can only prevent blkg from releasing when it is being used,
>> the blkg->refcnt can be decreased to 0 during blkg_destroy and schedule
>> blkg release.
>> Then trying to blkg_get in blk_throtl_bio will complains the WARNING.
>> And then the corresponding blkg_put will schedule blkg release again,
>> which result in double free.
>> This race is introduced by commit ae1188963611 ("blkcg: consolidate blkg
>> creation in blkcg_bio_issue_check()"). Before this commit, it will lookup
>> first and then try to lookup/create again with queue_lock. So revive
>> this logic to fix the race.
>
> The change seems a bit drastic to me. Can't we do something like the
> following instead?
>
> blk_throtl_bio()
> {
> ... non throttled cases ...
>
> /* out-of-limit, queue to @tg */
>
> /*
> * We can look up and retry but the race window is tiny here.
> * Just letting it through should be good enough.
> */
> if (!css_tryget(blkcg->css))
> goto out;
>
> ... actual queueing ...
> css_put(blkcg->css);
> ...
> }
So you mean checking css->refcnt to prevent the further use of
blkg_get? I think it makes sense.
IMO, we should use css_tryget_online instead, and rightly after taking
queue_lock. Because there may be more use of blkg_get in blk_throtl_bio
in the futher. Actually it already has two now. One is in
blk_throtl_assoc_bio, and the other is in throtl_qnode_add_bio.
What do you think of this?
Thanks,
Joseph