Miaohe Lin <linmia...@huawei.com> writes:

> On 2021/3/30 9:57, Huang, Ying wrote:
>> Hi, Miaohe,
>> 
>> Miaohe Lin <linmia...@huawei.com> writes:
>> 
>>> Hi all,
>>> I am investigating the swap code, and I found the below possible race 
>>> window:
>>>
>>> CPU 1                                                       CPU 2
>>> -----                                                       -----
>>> do_swap_page
>>>   skip swapcache case (synchronous swap_readpage)
>>>     alloc_page_vma
>>>                                                     swapoff
>>>                                                       release swap_file, 
>>> bdev, or ...
>>>       swap_readpage
>>>     check sis->flags is ok
>>>       access swap_file, bdev or ...[oops!]
>>>                                                         si->flags = 0
>>>
>>> The swapcache case is ok because swapoff will wait on the page_lock of 
>>> swapcache page.
>>> Is this will really happen or Am I miss something ?
>>> Any reply would be really grateful. Thanks! :)
>> 
>> This appears possible.  Even for swapcache case, we can't guarantee the
>
> Many thanks for reply!
>
>> swap entry gotten from the page table is always valid too.  The
>
> The page table may change at any time. And we may thus do some useless work.
> But the pte_same() check could handle these races correctly if these do not
> result in oops.
>
>> underlying swap device can be swapped off at the same time.  So we use
>> get/put_swap_device() for that.  Maybe we need similar stuff here.
>
> Using get/put_swap_device() to guard against swapoff for swap_readpage() 
> sounds
> really bad as swap_readpage() may take really long time. Also such race may 
> not be
> really hurtful because swapoff is usually done when system shutdown only.
> I can not figure some simple and stable stuff out to fix this. Any 
> suggestions or
> could anyone help get rid of such race?

Some reference counting on the swap device can prevent swap device from
swapping-off.  To reduce the performance overhead on the hot-path as
much as possible, it appears we can use the percpu_ref.

Best Regards,
Huang, Ying

Reply via email to