On 06/16/2016 08:43 AM, Michal Hocko wrote:
> [It seems that this patch has been sent several times and this
> particular copy didn't add Kirill who has added this code CC him now]
> 
> On Thu 16-06-16 17:42:14, Michal Hocko wrote:
>> On Thu 16-06-16 19:36:11, zhongjiang wrote:
>>> From: zhong jiang <[email protected]>
>>>
>>> when a process acquire a pmd table shared by other process, we
>>> increase the account to current process. otherwise, a race result
>>> in other tasks have set the pud entry. so it no need to increase it.
>>>
>>> Signed-off-by: zhong jiang <[email protected]>
>>> ---
>>>  mm/hugetlb.c | 5 ++---
>>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 19d0d08..3b025c5 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned 
>>> long addr, pud_t *pud)
>>>     if (pud_none(*pud)) {
>>>             pud_populate(mm, pud,
>>>                             (pmd_t *)((unsigned long)spte & PAGE_MASK));
>>> -   } else {
>>> +   } else 
>>>             put_page(virt_to_page(spte));
>>> -           mm_inc_nr_pmds(mm);
>>> -   }
>>
>> The code is quite puzzling but is this correct? Shouldn't we rather do
>> mm_dec_nr_pmds(mm) in that path to undo the previous inc?

I agree that the code is quite puzzling. :(

However, if this were an issue I would have expected to see some reports.
Oracle DB makes use of this feature (shared page tables) and if the pmd
count is wrong we would catch it in check_mm() at exit time.

Upon closer examination, I believe the code in question is never executed.
Note the callers of huge_pmd_share.  The calling code looks like:

                        if (want_pmd_share() && pud_none(*pud))
                                pte = huge_pmd_share(mm, addr, pud);
                        else
                                pte = (pte_t *)pmd_alloc(mm, pud, addr);

Therefore, we do not call huge_pmd_share unless pud_none(*pud).  The
code in question is only executed when !pud_none(*pud).

I think that entire if/else statement can be removed.  We know
pud_none(*pud), so just do pud_populate().

-- 
Mike Kravetz

>>
>>> +
>>>     spin_unlock(ptl);
>>>  out:
>>>     pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>> -- 
>>> 1.8.3.1
>>
>> -- 
>> Michal Hocko
>> SUSE Labs
> 

Reply via email to