Miaohe Lin <linmia...@huawei.com> writes:

> On 2021/4/10 1:17, Tim Chen wrote:
>> 
>> 
>> On 4/9/21 1:42 AM, Miaohe Lin wrote:
>>> On 2021/4/9 5:34, Tim Chen wrote:
>>>>
>>>>
>>>> On 4/8/21 6:08 AM, Miaohe Lin wrote:
>>>>> When I was investigating the swap code, I found the below possible race
>>>>> window:
>>>>>
>>>>> CPU 1                                     CPU 2
>>>>> -----                                     -----
>>>>> do_swap_page
>>>>>   synchronous swap_readpage
>>>>>     alloc_page_vma
>>>>>                                   swapoff
>>>>>                                     release swap_file, bdev, or ...
>>>>
>>>
>>> Many thanks for quick review and reply!
>>>
>>>> Perhaps I'm missing something.  The release of swap_file, bdev etc
>>>> happens after we have cleared the SWP_VALID bit in si->flags in 
>>>> destroy_swap_extents
>>>> if I read the swapoff code correctly.
>>> Agree. Let's look this more close:
>>> CPU1                                                                CPU2
>>> -----                                                               -----
>>> swap_readpage
>>>   if (data_race(sis->flags & SWP_FS_OPS)) {
>>>                                                             swapoff
>>>                                                               p->swap_file 
>>> = NULL;
>>>     struct file *swap_file = sis->swap_file;
>>>     struct address_space *mapping = swap_file->f_mapping;[oops!]
>>>                                                               ...
>>>                                                               p->flags = 0;
>>>     ...
>>>
>>> Does this make sense for you?
>> 
>> p->swapfile = NULL happens after the 
>> p->flags &= ~SWP_VALID, synchronize_rcu(), destroy_swap_extents() sequence 
>> in swapoff().
>> 
>> So I don't think the sequence you illustrated on CPU2 is in the right order.
>> That said, without get_swap_device/put_swap_device in swap_readpage, you 
>> could
>> potentially blow pass synchronize_rcu() on CPU2 and causes a problem.  so I 
>> think
>> the problematic race looks something like the following:
>> 
>> 
>> CPU1                                                         CPU2
>> -----                                                                -----
>> swap_readpage
>>   if (data_race(sis->flags & SWP_FS_OPS)) {
>>                                                              swapoff
>>                                                                p->flags = &= 
>> ~SWP_VALID;
>>                                                                ..
>>                                                                
>> synchronize_rcu();
>>                                                                ..
>>                                                                p->swap_file 
>> = NULL;
>>     struct file *swap_file = sis->swap_file;
>>     struct address_space *mapping = swap_file->f_mapping;[oops!]
>>                                                                ...
>>     ...
>> 
>
> Agree. This is also what I meant to illustrate. And you provide a better one. 
> Many thanks!

For the pages that are swapped in through swap cache.  That isn't an
issue.  Because the page is locked, the swap entry will be marked with
SWAP_HAS_CACHE, so swapoff() cannot proceed until the page has been
unlocked.

So the race is for the fast path as follows,

                if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
                    __swap_count(entry) == 1)

I found it in your original patch description.  But please make it more
explicit to reduce the potential confusing.

Best Regards,
Huang, Ying

Reply via email to