Hi Hao Sun,
> 3. The problem is the simulation showed Killed after 20+hours, I am not > sure where is the problem. But when I run single benchmark, there is no > problem. So > i. can I use *export GOMP_CPU_AFFINITY="0-7 8-15" to set cpu > affinity? *I am not sure which benchmarks support OpenMP, or could you > tell me the right way to bind different benchmark to different cpu? > Unfortunately, the pre-compiled PARSEC benchmarks on the disk image use Pthreads for parallelization rather than OpenMP, and they were not compiled to set CPU thread affinity. The affinity environment variable you're exporting won't affect these benchmarks. Given the complexity of what you're trying to simulate, it is likely you'll need to modify either the disk image or the benchmarks themselves. You may want to familiarize yourself with that process as described in our tech report: http://www.cs.utexas.edu/~parsec_m5/TR-09-32.pdf ii. I am trying to issue 16 threads (8 for blackschole, 8 for > bodytrack), so is my rcS file right? Technically, yes, your rcS file will run the benchmarks concurrently and with the expected number of threads. However, running a multithreaded AND multiprocess workload like this is going to be very tricky for a couple reasons: First, note that when you use the Linux terminal command '&', the last command is the one that gates the progress of the terminal thread. This means you can get race conditions and the simulation can exit by falling through to the '/sbin/m5 exit' before some of the applications have completed. Based on your rcS file and the behavior you're seeing, I suspect you may be running into this problem. I'd encourage you to play around with the following toy example in a standard bash terminal to see what I mean: % (sleep 2; echo "first") & (sleep 1; echo "second") Second, it seems you may be trying to not just get the benchmarks to run concurrently, but to also get their regions of interest (ROIs) to run concurrently. This is an even trickier problem to address than the process race problem. Depending on the way each benchmark works, it may take a varying amount of time for the control threads to set up the work and launch the worker threads, which means that one benchmark may actually complete its ROI before the other one even starts the ROI. If in fact you are trying to get the ROIs to run concurrently, you will probably need to do one or a couple things: A) You can extend the benchmark ROIs by increasing input set sizes or by modifying the benchmarks to loop over the ROI multiple times (the latter is often used in contention management papers when measuring workload throughput). B) Another option is to do some sophisticated, delayed benchmark launching. Here's an example rcS file snippet that would do that: ---------------------------------------------- # # First, ensure that the first benchmark doesn't need to complete or the # second benchmark runs longer than the first. # Second, start the benchmark that takes longer to get to ROI # Third, delay for roughly the difference in to-ROI run time # Fourth, launch the second benchmark # NOTE 1: Comments in an rcS can affect control process timing # NOTE 2: The sleep command is pretty imprecise (by up to ms) # NOTE 3: x264 runs longer than bandwidth_bench, but bandwidth_bench # takes 0.03s longer to get to its ROI # /sbin/m5 dumpresetstats ./bandwidth_bench & sleep 0.03 parsec/install/bin/x264 <params> /sbin/m5 dumpresetstats echo "Done :D" ---------------------------------------------- You'd need to run these benchmarks in isolation to collect the time it takes them to get to the ROI. It is also likely that you'd encounter some hairy non-determinism in the run times, especially if there may be contention for shared resources. Hope this helps, Joel On Tue, Oct 28, 2014 at 10:59 AM, Hao Sun <[email protected]> wrote: > Dear Joel Hestness, > > Sorry to bother you and I am really need your help. I am trying to run 2 > different parsec benchmarks on 2 groups of cpus, eg, *totally 16 cpus, > the first 8 cpus running blackscholes, and the other 8 cpus running > bodytrack benchmark*. I am running in the* full system mode*. I use the > pre-compile image file from http://www.cs.utexas.edu/~parsec_m5/ > > *1. My gem5 command is:* > ./build/ALPHA_FS/gem5.opt ./configs/example/fs.py -n 16 > --script=./configs/boot/runScript/blackscholes_bodytrack_8_8.rcS > > *2. The corresponding rcS file is:* > > #!/bin/sh > > # File to run the blackscholes benchmark and bodytrack > > export GOMP_CPU_AFFINITY="0-7 8-15" > cd /parsec/install/bin > /sbin/m5 dumpstats > /sbin/m5 resetstats > ./blackscholes 8 /parsec/install/inputs/blackscholes/in_4K.txt > /parsec/install/inputs/blackscholes/prices.txt & ./bodytrack > /parsec/install/inputs/bodytrack/sequenceB_1 4 1 1000 5 0 8 > echo "Done :D" > /sbin/m5 exit > /sbin/m5 exit > > 3. The problem is the simulation showed Killed after 20+hours, I am not > sure where is the problem. But when I run single benchmark, there is no > problem. So > i. can I use *export GOMP_CPU_AFFINITY="0-7 8-15" to set cpu > affinity? *I am not sure which benchmarks support OpenMP, or could you > tell me the right way to bind different benchmark to different cpu? > ii. I am trying to issue 16 threads (8 for blackschole, 8 for > bodytrack), so is my rcS file right? > > Thanks for your time to read me email! I really need your help, I have > been stuck at this problem for 3 weeks. Thanks in advance! > > Best regards, > Hao Sun > Northwestern University > -- Joel Hestness PhD Student, Computer Architecture Dept. of Computer Science, University of Wisconsin - Madison http://pages.cs.wisc.edu/~hestness/
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
