On 1/5/2024 1:35 PM, saras nanda via gem5-users wrote:
  Hi everyone ,

I am doing a full system simulation -ARM arch - using fs_bigLITTLE.py
I am using numpy library in my benchmark , which is running on gem5 FS, the problem I am facing is it takes a lot of time for the benchmark to just import numpy 3-4 days , yet I don't see it importing it or completing the import

I using the following command ,

./build/ARM/gem5.opt configs/example/arm/fs_bigLITTLE.py --kernel=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/binaries/vmlinux.arm64 --disk=/home/saras/gem5-resources/src/arm-ubuntu/gem5/full_system_images/disks/arm64-ubuntu-server.img --caches --cpu-type=atomic --kernel-init=/bin/bash

is it due to the unstable linux environment booted using /bin/bash

I am unable to claim it unstable as I don't get any errors or see any anomalous behaviours s,it just keeps running the benchmarks which has import numpy as first statement

I am unable to debug this problem to the root

any help provided would be much appreciated

Thank you in advance

You keep posting about this, and I am sorry we don't seem to have an answer.
I have a few comments / questions, though ...

What do you mean by "unstable linux environment booted using /bin/bash"?

The word "unstable" would generally mean something like "prone to unpredictable
failure".  Here, I think you mean something a little different, along the lines
of "performs in a way I do not understand."

Rather than running your full benchmark, I wonder if you are able to start
python3, import numpy, then quit, and if so, how long that takes.

On my modern, fairly high speed, laptop, not in gem5, it takes something like
1.8 seconds.  Allowing 10,000x slowdown for gem5 simulation of a program (I
would hope the slowdown would not be that bad if you're actually running
AtomicSimple or some similarly faster cpu model), though would mean about 5
hours to simulate.

On a server system I was able to do: perf stat -d python3 -c "import numpy"
and the results was:

 Performance counter stats for 'python3 -c import numpy':

            979.65 msec task-clock:u              #    3.559 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
             6,201      page-faults:u             #    0.006 M/sec
       474,329,129      cycles:u                  #    0.484 GHz
       566,694,729      instructions:u            #    1.19  insn per cycle
       129,243,270      branches:u                #  131.928 M/sec
         4,391,610      branch-misses:u           #    3.40% of all branches
       166,903,996      L1-dcache-loads:u         #  170.371 M/sec
         9,824,995      L1-dcache-load-misses:u   #    5.89% of all L1-dcache 
hits
         3,815,564      LLC-loads:u               #    3.895 M/sec
           103,009      LLC-load-misses:u         #    2.70% of all LL-cache 
hits

       0.275233187 seconds time elapsed

       0.626476000 seconds user
       0.355270000 seconds sys

The most relevant measure may be the 500-600 million instructions needed.  To
get a sense of how long this will take under gem5, we need a sense of how many
instructions it can simulate per second.  Let's suppose you have a 3 GHz host
processor with the previously mentioned 10000x slowdown in gem5.  That would
mean it is as if the simulated cpu is running at 300 KHz.  Assuming two cycles
per instruction and no pipelining, you need about 1 to 1.2 billion cycles
simulated.  Dividing 1.2 billion by 300,000 gives 4000 seconds of simulation
time, a little over an hour.

Given the roughness of these calculations and differences between my laptop
(which was using WSL under Windows) versus a native Linux installation on the
server, the agreement seems reasonable to me.

Note that this is the amount of time needed after you have booted the OS.
Your benchmark could also be doing a lot of other stuff tat is somehow being
conflated there, too - I am not sure how you are drawing the conclusion that
it is in the process of importing numpy, but I don't mean to question what you
are doing.  There could also be something going on here about differences in
details and versions of python, numpy, etc.  Lastly, I am giving stats for
x86; ARM could clearly be somewhat different, though unlikely by a factor of
10 (say).

Do you have an actual ARM where you can measure time needed when not in gem5,
for the same application code and OS?  That would give a baseline against
which to compare.

Hope maybe there is something here that helps.

EM
_______________________________________________
gem5-users mailing list -- gem5-us...@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org

Reply via email to