[Libhugetlbfs-devel] [PATCH 7/7] tlbmiss_cost.sh: Rework method by which TLB miss cost is calculated with oprofile

Mel Gorman Wed, 04 Nov 2009 07:35:27 -0800

The mechanism as-is for calculating TLB misses does not appear to work.
At least on PPC970 which I have courtesy of Terrasoft, the DTLB misses
and page table walk cycles cannot be calculated at the same time and it
is not the method used in the paper I wrote on the calculation of TLB
miss costs.


This patch implements TLB miss cost calculations with oprofile as
described on the paper. It uses two separate runs to calculate TLB
misses and the time spent walking the page tables when running the
STREAM benchmark. As the rate the events occur at deviate so
significantly, the oprofile reports would ordinarily not be compatible.
To get around this, the second oprofile run is executed multiple times
until the number of cycles spent running STREAM are roughly comparable.
It then calculates the cost of a TLB miss based on the number of cycles
spent walking the page tables and the number of misses that occured.

Example output looks like

$ tlbmiss_cost.sh --verbose
TRACE: Forcing shutdown of oprofile
TRACE: Gathering the name of CPU events
TRACE: Warming the benchmark to avoid page faults of the binary
TRACE: oprofile launch commands as follows
TRACE: dtlb misses :: oprofile_start --event timer --event dtlb_miss 
--sample-cycle-factor 5
TRACE: tablewalk cycles :: oprofile_start --event timer30 --event
tablewalk_cycles --sample-cycle-factor 5 --sample-event-factor
TRACE: Rerunning benchmark to measure number of DTLB misses
TRACE: Shutting down oprofile
TRACE: Running benchmark to measure table walk cycles
TRACE: High diff with scaling x1. Required 595239 +/ 23809, got 158913
TRACE: High diff with scaling x4. Required 595239 +/ 23809, got 369069
TRACE: High diff with scaling x7. Required 595239 +/ 23809, got 490783
TRACE: High diff with scaling x9. Required 595239 +/ 23809, got 553222
TRACE: Cycle DTLB Scale: 50000
TRACE: Cycle Walk Scale: 50000
TRACE: DTLB       Scale: 1000
TRACE: Walk       Scale: 10000
TRACE: Cycle DTLB events: 594671 + 568 = 595239
TRACE: Cycle Walk events: 577426 + 1308 = 578734
TRACE: DTLB    events: 9789 + 104 = 9893
TRACE: Walk    events: 516100 + 1359 = 517459
TLB_MISS_COST=523

So, on PPC970 at least, the cost of a TLB miss is approximately 523
cycles. The method still needs to be verified on different families of POWER.

Signed-off-by: Mel Gorman <m...@csn.ul.ie>
---
 tlbmiss_cost.sh |  164 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 132 insertions(+), 32 deletions(-)

diff --git a/tlbmiss_cost.sh b/tlbmiss_cost.sh
index 792f62d..f376871 100755
--- a/tlbmiss_cost.sh
+++ b/tlbmiss_cost.sh
@@ -8,6 +8,7 @@
 #
 # Licensed under LGPL 2.1 as packaged with libhugetlbfs
 # (c) Eric B Munson 2009
+# (c) Mel Gorman 2009
 
 # calibrator can be found here:
 # http://homepages.cwi.nl/~manegold/Calibrator/v0.9e/calibrator.c
@@ -186,7 +187,7 @@ calibrator_calc()
 # This method uses the stream memory benchmark which can be found here:
 # http://www.cs.virginia.edu/stream/FTP/Code/stream.c
 # and should be compiled with this command line:
-# gcc -O3 -DN=44739240 stream.c -o STREAM
+# gcc -m32 -O3 -DN=44739240 stream.c -o STREAM
 # and then placed in the same directory as this script
 
 stream_fetch()
@@ -255,53 +256,152 @@ oprofile_calc()
                die "Unable to locate oprofile_start.sh."
        fi
 
-       print_trace Beginning TLB measurement using STREAM and oprofile
+       print_trace Forcing shutdown of oprofile
+       opcontrol --shutdown >/dev/null 2>&1
+       opcontrol --deinit >/dev/null 2>&1
 
+       print_trace Gathering the name of CPU events
+       TIMER_DTLB_EVENT=`oprofile_map_events.pl --event timer | cut -d: -f1 2> 
/dev/null`
+       TIMER_WALK_EVENT=`oprofile_map_events.pl --event timer30 | cut -d: -f1 
2> /dev/null`
+       WALK_EVENT=`oprofile_map_events.pl --event tablewalk_cycles | cut -d: 
-f1 2> /dev/null`
+       DTLB_EVENT=`oprofile_map_events.pl --event dtlb_miss | cut -d: -f1 2> 
/dev/null`
 
-       # First we run without hugepages to get a baseline
-       print_trace Starting oprofile_start --event dtlb_miss --event 
tablewalk_cycles
-       $OPST --event dtlb_miss --event tablewalk_cycles >/dev/null 2>&1
-       if [ "$?" -ne "0" ]; then
-               die "Error starting oprofile, check oprofile_map_event.pl for 
appropriate dtlb_miss and tablewalk_cycles events."
-       fi
+       print_trace Warming the benchmark to avoid page faults of the binary
+       $STREAM >/dev/null 2>&1
 
-       print_trace Gathering baseline of STREAM performance
+       # Help debug problems launching oprofile
+       print_trace oprofile launch commands as follows
+       print_trace dtlb misses :: oprofile_start --event timer --event 
dtlb_miss --sample-cycle-factor 5
+       print_trace tablewalk cycles :: oprofile_start --event timer30 --event 
tablewalk_cycles --sample-cycle-factor 5 --sample-event-factor 
$SAMPLE_EVENT_FACTOR
+
+       print_trace Rerunning benchmark to measure number of DTLB misses
+       $OPST --event timer --event dtlb_miss --sample-cycle-factor 5 
>/dev/null 2>&1 || \
+               die "Error starting oprofile, check oprofile_map_event.pl for 
appropriate timer and dtlb_miss events."
        $STREAM >/dev/null 2>&1
 
-       print_trace Stopping profile
        opcontrol --stop >/dev/null 2>&1
        opcontrol --dump >/dev/null 2>&1
 
-       print_trace Dumping profile
-       RESULTS=`opreport | grep "_MISS_" | head -1`
-       MISS_SCALE=`echo $RESULTS | sed 's/.* count \([0-9]*\).*/\1/'`
-       RESULTS=`opreport | grep "_TABLEWALK_" | head -1`
-       WALK_SCALE=`echo $RESULTS | sed 's/.* count \([0-9]*\).*/\1/'`
-       RESULTS=`opreport | grep " STREAM" | head -1`
-
-       # This grabs the first whole number on the line which will be the
-       # number of table walk cycles, the second will be TLB misses
-       print_trace Analysing profile
-       TABLE_WALK=`echo "$RESULTS" | sed 's/[[:space:]]*\([0-9]*\).*/\1/'`
-       TLB_MISS=`echo "$RESULTS" | sed 
's/[[:space:]]*[0-9]*[[:space:]]*[[:graph:]]*[[:space:]]*\([0-9]*\).*/\1/'`
-       RESULTS=`opreport | grep " vmlinux" | head -1`
-       KERN_TABLE_WALK=`echo "$RESULTS" | sed 's/[[:space:]]*\([0-9]*\).*/\1/'`
-       KERN_TLB_MISS=`echo "$RESULTS" | sed 
's/[[:space:]]*[0-9]*[[:space:]]*[[:graph:]]*[[:space:]]*\([0-9]*\).*/\1/'`
-       if [[ "$KERN_TLB_MISS" != "" ]]; then
-               TLB_MISS=$(($TLB_MISS+$KERN_TLB_MISS))
+       # First ensure that the location of event counters are where we
+       # expect them to be. The expectation is that the timer30 is in
+       # the first column and the tablewalk_cycles is in the third
+       SAMPLES_START=`opreport | grep -n "^\s*samples" | head -1 | cut -d: -f1`
+       if [ "$SAMPLES_START" = "" ]; then
+               die Could not establish start of samples from opreport
+               SAMPLES_START=$(($COUNT_START+1))
+       fi
+       INDEX=`opreport | head -$SAMPLES_START | grep "^Counted .* events" | 
grep -n $TIMER_DTLB_EVENT | cut -d: -f1`
+       TIMER_DTLB_FIELD=$((1+2*($INDEX - 1)))
+       if [ $TIMER_DTLB_FIELD -ne 1 ]; then
+               die Timer events are not in the expected column, parse failure
        fi
-       if [[ "$KERN_TABLE_WALK" != "" ]]; then
-               TABLE_WALK=$(($TABLE_WALK+$KERN_TABLE_WALK))
+       INDEX=`opreport | head -$SAMPLES_START | grep "^Counted .* events" | 
grep -n $DTLB_EVENT | cut -d: -f1`
+       DTLB_FIELD=$((1+2*($INDEX - 1)))
+       if [ $DTLB_FIELD -ne 3 ]; then
+               die DTLB miss events are not in the expected column, parse 
failure
        fi
 
+       # Columns look ok, extract the event counts
+       TIMER_DTLB_SCALE=`opreport | grep "$TIMER_DTLB_EVENT" | head -1 | sed 
's/.* count \([0-9]*\).*/\1/'`
+       DTLB_SCALE=`opreport | grep "$DTLB_EVENT" | head -1 | sed 's/.* count 
\([0-9]*\).*/\1/'`
+
+       RESULTS=`opreport | grep " STREAM" | head -1`
+       STREAM_TIMER_DTLB=`echo "$RESULTS" | sed 
's/[[:space:]]*\([0-9]*\).*/\1/'`
+       STREAM_DTLB=`echo "$RESULTS" | sed 
's/[[:space:]]*[0-9]*[[:space:]]*[[:graph:]]*[[:space:]]*\([0-9]*\).*/\1/'`
+       RESULTS=`opreport | grep " vmlinux" | head -1`
+       KERNEL_TIMER_DTLB=`echo "$RESULTS" | sed 
's/[[:space:]]*\([0-9]*\).*/\1/'`
+       KERNEL_DTLB=`echo "$RESULTS" | sed 
's/[[:space:]]*[0-9]*[[:space:]]*[[:graph:]]*[[:space:]]*\([0-9]*\).*/\1/'`
+
+       TIMER_DTLB=$(($STREAM_TIMER_DTLB+$KERNEL_TIMER_DTLB))
+       DTLB=$(($STREAM_DTLB+$KERNEL_DTLB))
+
        print_trace Shutting down oprofile
        opcontrol --shutdown >/dev/null 2>&1
        opcontrol --deinit >/dev/null 2>&1
 
-       TLB_MISS=$(($TLB_MISS*$MISS_SCALE))
-       TABLE_WALK=$(($TABLE_WALK*$WALK_SCALE))
+       # Next STREAM needs to be run measuring the tablewalk_cycles. Because
+       # of differences in teh frequency CPU events occur, there are
+       # alterations in the timing. To make an accurate comparison, the
+       # cycle counts of the two profiles need to be very similar. oprofile
+       # does not give much help here in matching up different reports taking
+       # different readings so there is nothing really to do but run STREAM
+       # multiple times, scaling the events at different rates until a
+       # reasonably close match is found.
+
+       # The cycle counts for two oprofiles must be within 10% of each other
+       TOLERANCE=$(($TIMER_DTLB*4/100))
+       SAMPLE_EVENT_FACTOR=1
+       LOW_TIMER_WALK=0
+       HIGH_TIMER_WALK=0
+
+       print_trace Running benchmark to measure table walk cycles
+       while [ $TIMER_DTLB -ge $LOW_TIMER_WALK -a $TIMER_DTLB -ge 
$HIGH_TIMER_WALK ]; do
+
+               if [ $LOW_TIMER_WALK -ne 0 ]; then
+                       print_trace High diff with scaling 
x$LAST_SAMPLE_EVENT_FACTOR. Required $TIMER_DTLB +/ $TOLERANCE, got $TIMER_WALK
+               fi
+
+               $OPST --event timer30 --event tablewalk_cycles 
--sample-cycle-factor 5 --sample-event-factor $SAMPLE_EVENT_FACTOR >/dev/null 
2>&1 || \
+                       die "Error starting oprofile, check 
oprofile_map_event.pl for appropriate timer30 and tablewalk_cycles events."
+               $STREAM >/dev/null 2>&1
+
+               opcontrol --stop >/dev/null 2>&1
+               opcontrol --dump >/dev/null 2>&1
+
+               # Extract the event counts
+               TIMER_WALK_SCALE=`opreport | grep "$TIMER_WALK_EVENT" | head -1 
| sed 's/.* count \([0-9]*\).*/\1/'`
+               WALK_SCALE=`opreport | grep "$WALK_EVENT" | head -1 | sed 's/.* 
count \([0-9]*\).*/\1/'`
+
+               RESULTS=`opreport | grep " STREAM" | head -1`
+               STREAM_TIMER_WALK=`echo "$RESULTS" | sed 
's/[[:space:]]*\([0-9]*\).*/\1/'`
+               STREAM_WALK=`echo "$RESULTS" | sed 
's/[[:space:]]*[0-9]*[[:space:]]*[[:graph:]]*[[:space:]]*\([0-9]*\).*/\1/'`
+               RESULTS=`opreport | grep " vmlinux" | head -1`
+               KERNEL_TIMER_WALK=`echo "$RESULTS" | sed 
's/[[:space:]]*\([0-9]*\).*/\1/'`
+               KERNEL_WALK=`echo "$RESULTS" | sed 
's/[[:space:]]*[0-9]*[[:space:]]*[[:graph:]]*[[:space:]]*\([0-9]*\).*/\1/'`
+
+               TIMER_WALK=$(($STREAM_TIMER_WALK+$KERNEL_TIMER_WALK))
+               WALK=$(($STREAM_WALK+$KERNEL_WALK))
+
+               LOW_TIMER_WALK=$(($TIMER_WALK-$TOLERANCE))
+               HIGH_TIMER_WALK=$(($TIMER_WALK+$TOLERANCE))
+
+               # Scale faster if the difference between timers is huge
+               LAST_SAMPLE_EVENT_FACTOR=$SAMPLE_EVENT_FACTOR
+               if [ $(($TIMER_DTLB*3/4-$HIGH_TIMER_WALK)) -gt 0 ]; then
+                       SAMPLE_EVENT_FACTOR=$(($SAMPLE_EVENT_FACTOR+3))
+               elif [ $(($TIMER_DTLB*9/10-$HIGH_TIMER_WALK)) -gt 0 ]; then
+                       SAMPLE_EVENT_FACTOR=$(($SAMPLE_EVENT_FACTOR+2))
+               else
+                       SAMPLE_EVENT_FACTOR=$(($SAMPLE_EVENT_FACTOR+1))
+               fi
+       
+               opcontrol --shutdown >/dev/null 2>&1
+               opcontrol --deinit >/dev/null 2>&1
+       done
+
+       print_trace "Cycle DTLB Scale: $TIMER_DTLB_SCALE"
+       print_trace "Cycle Walk Scale: $TIMER_WALK_SCALE"
+       print_trace "DTLB       Scale: $DTLB_SCALE"
+       print_trace "Walk       Scale: $WALK_SCALE"
+       print_trace "Cycle DTLB events: $STREAM_TIMER_DTLB + $KERNEL_TIMER_DTLB 
= $TIMER_DTLB"
+       print_trace "Cycle Walk events: $STREAM_TIMER_WALK + $KERNEL_TIMER_WALK 
= $TIMER_WALK"
+       print_trace "DTLB    events: $STREAM_DTLB + $KERNEL_DTLB = $DTLB"
+       print_trace "Walk    events: $STREAM_WALK + $KERNEL_WALK = $WALK"
+
+       # This shouldn't happen. One would expect that the minimum sample
+       # rate for any of the timers in any groups is the same. If they
+       # differ, it might be a simple bug in oprofile_map_event that
+       # needs fixing. In the event this bug is reported, get the CPU
+       # type and the output of opcontrol --list-events
+       if [ $TIMER_DTLB_SCALE -ne $TIMER_WALK_SCALE ]; then
+               die Cycle CPUs were sampled at different rates.
+       fi
 
-       LAST_LATENCY_CYCLES=$(($TABLE_WALK/$TLB_MISS))
+       # Final calculation. Ignore kernel overhead in this version as it
+       # might be due to interference from other processes
+       TLB_MISSES=$(($DTLB_SCALE*$DTLB))
+       TLB_CYCLES=$(($WALK_SCALE*$WALK))
+       LAST_LATENCY_CYCLES=$(($TLB_CYCLES/$TLB_MISSES))
 }
 
 ARGS=`getopt -o c:s:vq --long 
calibrator:,stream:,verbose,quiet,fetch-calibrator,fetch-stream -n 
'tlbmiss_cost.sh' -- "$@"`
-- 
1.6.3.3


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Libhugetlbfs-devel mailing list
Libhugetlbfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libhugetlbfs-devel

[Libhugetlbfs-devel] [PATCH 7/7] tlbmiss_cost.sh: Rework method by which TLB miss cost is calculated with oprofile

Reply via email to