In addition the the two patches, there are two more patches that I would
like to get some feedback.
The two patches are more radical: the 3rd deals with free path
zone->lock contention by avoiding doing any merge for order0 pages while
the 4th deals with allocation path zone->lock contention by taking
pcp->batch pages off the free_area order0 list without the need to
iterate the list.
Both patches are developed based on "the most time consuming part of
operations under zone->lock is cache misses on struct page".
The 3rd patch may be controversial but doesn't have correctness problem;
the 4th is in an early stage and serves only as a proof-of-concept.
Your comments are appreciated, thanks.