I don't see anything amazingly amiss in your output, but the number of
times the open/etc. fail is interesting -- Kyle do we see the same thing?
If not, it could be that you should update your apu_se.py to point to the
"correct" place to search for the libraries first?

Also, based on Kyle's reply, Dan how long does it take you to boot up
square?  Certainly a slower machine might take longer, but it does seem
even slower than expected.  But if we're trying the same application, maybe
it will be easier to spot differences.

I would also recommend updating to the latest commit on the staging branch
-- I don't believe it should break anything with those patches.

Yes, looks like you are using the release version of ROCm -- no issues
there.

Matt



On Thu, Jun 11, 2020 at 9:38 AM Daniel Gerzhoy <daniel.gerz...@gmail.com>
wrote:

> I am using the docker, yeah.
> It's running on our server cluster which is a Xeon Gold 5117 @ (2.0 - 2.8
> GHz) which might make up some of the difference, the r5 3600 has a faster
> clock (3.6-4.2 GHz).
>
> I've hesitated to update my branch because in the Dockerfile it
> specifically checks this branch out and applies a patch, though the patch
> isn't very extensive.
> This was from a while back (November maybe?) and I know you guys have been
> integrating things into the main branch (thanks!)
> I was thinking I would wait until it's fully merged into the mainline gem5
> branch and rebase onto that and try to merge my changes in.
>
> Last I checked the GCN3 stuff is in the dev branch not the master right?
>
> But if it will help maybe I should update to the head of this branch. Will
> I need to update the docker as well?
>
> As for the debug vs release rocm I think I'm using the release version.
> This is what the dockerfile built:
>
> ARG rocm_ver=1.6.2
> RUN wget -qO- repo.radeon.com/rocm/archive/apt_${rocm_ver}.tar.bz2
> <http://repo.radeon.com/rocm/archive/apt_$%7Brocm_ver%7D.tar.bz2> \
>     | tar -xjv \
>     && cd apt_${rocm_ver}/pool/main/ \
>     && dpkg -i h/hsakmt-roct-dev/* \
>     && dpkg -i h/hsa-ext-rocr-dev/* \
>     && dpkg -i h/hsa-rocr-dev/* \
>     && dpkg -i r/rocm-utils/* \
>     && dpkg -i h/hcc/* \
>     && dpkg -i h/hip_base/* \
>     && dpkg -i h/hip_hcc/* \
>     && dpkg -i h/hip_samples/*
>
>
> I ran a benchmark that prints that it entered main and returns
> immediately, this took 9 minutes.
> I've attached a debug trace with debug flags = "GPUDriver,SyscallVerbose"
> There's a lot of weird things going on, "syscall open: failed", "syscall
> brk: break point changed to [...]", and lots of ignored system calls.
>
> head of Stats for reference:
> ---------- Begin Simulation Statistics ----------
> sim_seconds                                  0.096192
>   # Number of seconds simulated
> sim_ticks                                 96192368500
>   # Number of ticks simulated
> final_tick                                96192368500
>   # Number of ticks from beginning of simulation (restored from checkpoints
> and never reset)
> sim_freq                                 1000000000000
>   # Frequency of simulated ticks
> host_inst_rate                                 175209
>   # Simulator instruction rate (inst/s)
> host_op_rate                                   338409
>   # Simulator op (including micro ops) rate (op/s)
> host_tick_rate                              175362515
>   # Simulator tick rate (ticks/s)
> host_mem_usage                                1628608
>   # Number of bytes of host memory used
> host_seconds                                   548.53
>   # Real time elapsed on the host
> sim_insts                                    96108256
>   # Number of instructions simulated
> sim_ops                                     185628785
>   # Number of ops (including micro ops) simulated
> system.voltage_domain.voltage                       1
>   # Voltage in Volts
> system.clk_domain.clock                          1000
>   # Clock period in ticks
>
> Maybe something in the attached file explains it better than I can express.
>
> Many thanks for your help and hard work!
>
> Dan
>
>
>
>
>
> On Thu, Jun 11, 2020 at 3:32 AM Kyle Roarty <kroa...@wisc.edu> wrote:
>
>> Running through a few applications, it took me about 2.5 minutes or less
>> each time using docker to start executing the program on an r5 3600.
>>
>> I ran square, dynamic_shared, and MatrixTranspose (All from HIP) which
>> took about 1-1.5 mins.
>>
>> I ran conv_bench and rnn_bench from DeepBench which took just about 2
>> minutes.
>>
>> Because of that, it's possible the size of the app has an effect on setup
>> time, as the HIP apps are extremely small.
>>
>> Also, the commit Dan is checked out on is d0945dc
>> <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639>
>>  mem-ruby:
>> add cache hit/miss statistics for TCP and TCC
>> <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639>,
>> which isn't the most recent commit. I don't believe that that would account
>> for such a large slowdown, but it doesn't hurt to try the newest commit
>> unless it breaks something.
>>
>> Kyle
>> ------------------------------
>> *From:* mattdsincl...@gmail.com <mattdsincl...@gmail.com>
>> *Sent:* Thursday, June 11, 2020 1:15 AM
>> *To:* gem5 users mailing list <gem5-users@gem5.org>
>> *Cc:* Daniel Gerzhoy <daniel.gerz...@gmail.com>; GAURAV JAIN <
>> gja...@wisc.edu>; Kyle Roarty <kroa...@wisc.edu>
>> *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time
>>
>> Gaurav & Kyle, do you know if this is the case?
>>
>> Dan, I believe the short answer is yes although 7-8 minutes seems a
>> little long.  Are you running this in Kyle's Docker, or separately?  If in
>> the Docker, that does increase the overhead somewhat, so running it
>> directly on a system would likely reduce the overhead somewhat.  Also, are
>> you running with the release or debug version of the ROCm drivers?  Again,
>> debug version will likely add some time to this.
>>
>> Matt
>>
>> On Wed, Jun 10, 2020 at 2:00 PM Daniel Gerzhoy via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>> I've been running simulations using the GCN3 branch:
>>
>> rocm_ver=1.6.2
>> $git branch
>>    * (HEAD detached at d0945dc)
>>       agutierr/master-gcn3-staging
>>
>> And I've noticed that it takes roughly 7-8 minutes to get to main()
>>
>> I'm guessing that this is the simulator setting up drivers?
>> Is that correct? Is there other stuff going on?
>>
>> *Has anyone found a way to speed this up? *
>>
>> I am trying to get some of the rodinia benchmarks from the HIP-Examples
>> running and debugging takes a long time as a result.
>>
>> I suspect that this is unavoidable but I won't know if I don't ask!
>>
>> Cheers,
>>
>> Dan Gerzhoy
>> _______________________________________________
>> gem5-users mailing list -- gem5-users@gem5.org
>> To unsubscribe send an email to gem5-users-le...@gem5.org
>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>
>>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to