Hi Hao Sun,

> 3. The problem is the simulation showed Killed after 20+hours, I am not
> sure where is the problem. But when I run single benchmark, there is no
> problem. So
>      i. can I use *export GOMP_CPU_AFFINITY="0-7 8-15" to set cpu
> affinity? *I am not sure which benchmarks support OpenMP, or could you
> tell me the right way to bind different benchmark to different cpu?
>

Unfortunately, the pre-compiled PARSEC benchmarks on the disk image use
Pthreads for parallelization rather than OpenMP, and they were not compiled
to set CPU thread affinity. The affinity environment variable you're
exporting won't affect these benchmarks.

Given the complexity of what you're trying to simulate, it is likely you'll
need to modify either the disk image or the benchmarks themselves. You may
want to familiarize yourself with that process as described in our tech
report: http://www.cs.utexas.edu/~parsec_m5/TR-09-32.pdf


     ii. I am trying to issue 16 threads (8 for blackschole, 8 for
> bodytrack), so is my rcS file right?


Technically, yes, your rcS file will run the benchmarks concurrently and
with the expected number of threads. However, running a multithreaded AND
multiprocess workload like this is going to be very tricky for a couple
reasons:

  First, note that when you use the Linux terminal command '&', the last
command is the one that gates the progress of the terminal thread. This
means you can get race conditions and the simulation can exit by falling
through to the '/sbin/m5 exit' before some of the applications have
completed. Based on your rcS file and the behavior you're seeing, I suspect
you may be running into this problem. I'd encourage you to play around with
the following toy example in a standard bash terminal to see what I mean:

  % (sleep 2; echo "first") & (sleep 1; echo "second")


  Second, it seems you may be trying to not just get the benchmarks to run
concurrently, but to also get their regions of interest (ROIs) to run
concurrently. This is an even trickier problem to address than the process
race problem. Depending on the way each benchmark works, it may take a
varying amount of time for the control threads to set up the work and
launch the worker threads, which means that one benchmark may actually
complete its ROI before the other one even starts the ROI. If in fact you
are trying to get the ROIs to run concurrently, you will probably need to
do one or a couple things:
   A) You can extend the benchmark ROIs by increasing input set sizes or by
modifying the benchmarks to loop over the ROI multiple times (the latter is
often used in contention management papers when measuring workload
throughput).
   B) Another option is to do some sophisticated, delayed benchmark
launching. Here's an example rcS file snippet that would do that:

----------------------------------------------
#
# First, ensure that the first benchmark doesn't need to complete or the
# second benchmark runs longer than the first.
# Second, start the benchmark that takes longer to get to ROI
# Third, delay for roughly the difference in to-ROI run time
# Fourth, launch the second benchmark
# NOTE 1: Comments in an rcS can affect control process timing
# NOTE 2: The sleep command is pretty imprecise (by up to ms)
# NOTE 3: x264 runs longer than bandwidth_bench, but bandwidth_bench
#               takes 0.03s longer to get to its ROI
#
/sbin/m5 dumpresetstats
./bandwidth_bench &
sleep 0.03
parsec/install/bin/x264 <params>
/sbin/m5 dumpresetstats
echo "Done :D"
----------------------------------------------

   You'd need to run these benchmarks in isolation to collect the time it
takes them to get to the ROI. It is also likely that you'd encounter some
hairy non-determinism in the run times, especially if there may be
contention for shared resources.

  Hope this helps,
  Joel



On Tue, Oct 28, 2014 at 10:59 AM, Hao Sun <[email protected]>
wrote:

> Dear Joel Hestness,
>
> Sorry to bother you and I am really need your help. I am trying to run 2
> different parsec benchmarks on 2 groups of cpus, eg, *totally 16 cpus,
> the first 8 cpus running blackscholes, and the other 8 cpus running
> bodytrack benchmark*. I am running in the* full system mode*. I use the
> pre-compile image file from http://www.cs.utexas.edu/~parsec_m5/
>
> *1. My gem5 command is:*
> ./build/ALPHA_FS/gem5.opt  ./configs/example/fs.py -n 16
> --script=./configs/boot/runScript/blackscholes_bodytrack_8_8.rcS
>
> *2. The corresponding rcS file is:*
>
> #!/bin/sh
>
> # File to run the blackscholes benchmark and bodytrack
>
> export GOMP_CPU_AFFINITY="0-7 8-15"
> cd /parsec/install/bin
> /sbin/m5 dumpstats
> /sbin/m5 resetstats
> ./blackscholes 8 /parsec/install/inputs/blackscholes/in_4K.txt
> /parsec/install/inputs/blackscholes/prices.txt & ./bodytrack
> /parsec/install/inputs/bodytrack/sequenceB_1 4 1 1000 5 0 8
> echo "Done :D"
> /sbin/m5 exit
> /sbin/m5 exit
>
> 3. The problem is the simulation showed Killed after 20+hours, I am not
> sure where is the problem. But when I run single benchmark, there is no
> problem. So
>      i. can I use *export GOMP_CPU_AFFINITY="0-7 8-15" to set cpu
> affinity? *I am not sure which benchmarks support OpenMP, or could you
> tell me the right way to bind different benchmark to different cpu?
>      ii. I am trying to issue 16 threads (8 for blackschole, 8 for
> bodytrack), so is my rcS file right?
>
> Thanks for your time to read me email! I really need your help, I have
> been stuck at this problem for 3 weeks. Thanks in advance!
>
> Best regards,
> Hao Sun
> Northwestern University
>



-- 
  Joel Hestness
  PhD Student, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to