On Mon, Jun 16, 2014 at 11:45:42PM -0400, Waiman Long wrote: > On 06/16/2014 04:59 PM, Kirill A. Shutemov wrote: > >On Mon, Jun 16, 2014 at 11:49:34PM +0300, Kirill A. Shutemov wrote: > >>On Mon, Jun 16, 2014 at 03:35:48PM -0400, Waiman Long wrote: > >>>In the __split_huge_page_map() function, the check for > >>>page_mapcount(page) is invariant within the for loop. Because of the > >>>fact that the macro is implemented using atomic_read(), the redundant > >>>check cannot be optimized away by the compiler leading to unnecessary > >>>read to the page structure. > >And atomic_read() is *not* atomic operation. It's implemented as > >dereferencing though cast to volatile, which suppress compiler > >optimization, but doesn't affect what CPU can do with the variable. > > > >So I doubt difference will be measurable anywhere. > > > > Because it is treated as an volatile object, the compiler will have to > reread the value of the relevant page structure field in every iteration of > the loop (512 for x86) when pmd_write(*pmd) is true. I saw some slight > improvement (about 2%) of a microbench that I wrote to break up 1000 THPs > with 1000 forked processes.
Then bring patch with performance data. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

