Hi Dan,

Glad to hear things are working, and thanks for the tips!  I must admit to
not quite following what the solution was though -- are you saying the
solution is to replace exit(0)/return with m5_exit()?  I thought your
original post said the problem was things taking a really long time before
main?  If so, it would seem like something else must have been the
problem/solution?

Coming to your other questions: I don't recall what exactly the root cause
of the hipDeviceSynchronize failure is, but I would definitely recommend
updating to the current staging branch head first and testing.  I am also
hoping to push a fix today to the barrier bit synchronization -- most of
the hipDeviceSynchronize-type failures I've seen were due to a bug in my
barrier bit implementation.  I'm not sure if this will be the solution to
your problem or not, but I can definitely add you as a reviewer and/or
point you to it if needed.

Not sure about the m5op, hopefully someone else can chime in on that.

Thanks,
Matt

On Fri, Jun 12, 2020 at 12:12 PM Daniel Gerzhoy <daniel.gerz...@gmail.com>
wrote:

> I've figured it out.
>
> To measure the time it took to get to main() I put a *return 0; *at the
> beginning of the function so I wouldn't have to babysit it.
>
> I didn't consider that it would also take some time for the simulator to
> exit, which is where the extra few minutes comes from.
> Side-note: *m5_exit(0);* instead of a return exits immediately.
>
> 5 min is a bit more reasonable of a slowdown for the difference between
> the two clocks.
>
> Two incidental things:
>
> 1. Is there a way to have gem5 spit out (real wall-clock) timestamps while
> it's printing stuff?
> 2. A while ago I asked about hipDeviceSynchronize(); causing crashes
> (panic: Tried to read unmapped address 0xff0000c29f48.). Has this been
> fixed since?
>
> I'm going to update to the head of this branch soon, and eventually to the
> main branch. If it hasn't been fixed I've created a workaround by stealing
> the completion signal of the kernel based on its launch id, and manually
> waiting for it using the HSA interface.
> Happy to help out and implement this as a m5op (or something) if that
> would be helpful for you guys.
>
> Best,
>
> Dan
>
> On Thu, Jun 11, 2020 at 12:40 PM Matt Sinclair <mattdsincl...@gmail.com>
> wrote:
>
>> I don't see anything amazingly amiss in your output, but the number of
>> times the open/etc. fail is interesting -- Kyle do we see the same thing?
>> If not, it could be that you should update your apu_se.py to point to the
>> "correct" place to search for the libraries first?
>>
>> Also, based on Kyle's reply, Dan how long does it take you to boot up
>> square?  Certainly a slower machine might take longer, but it does seem
>> even slower than expected.  But if we're trying the same application, maybe
>> it will be easier to spot differences.
>>
>> I would also recommend updating to the latest commit on the staging
>> branch -- I don't believe it should break anything with those patches.
>>
>> Yes, looks like you are using the release version of ROCm -- no issues
>> there.
>>
>> Matt
>>
>>
>>
>> On Thu, Jun 11, 2020 at 9:38 AM Daniel Gerzhoy <daniel.gerz...@gmail.com>
>> wrote:
>>
>>> I am using the docker, yeah.
>>> It's running on our server cluster which is a Xeon Gold 5117 @ (2.0 -
>>> 2.8 GHz) which might make up some of the difference, the r5 3600 has a
>>> faster clock (3.6-4.2 GHz).
>>>
>>> I've hesitated to update my branch because in the Dockerfile it
>>> specifically checks this branch out and applies a patch, though the patch
>>> isn't very extensive.
>>> This was from a while back (November maybe?) and I know you guys have
>>> been integrating things into the main branch (thanks!)
>>> I was thinking I would wait until it's fully merged into the mainline
>>> gem5 branch and rebase onto that and try to merge my changes in.
>>>
>>> Last I checked the GCN3 stuff is in the dev branch not the master right?
>>>
>>> But if it will help maybe I should update to the head of this branch.
>>> Will I need to update the docker as well?
>>>
>>> As for the debug vs release rocm I think I'm using the release version.
>>> This is what the dockerfile built:
>>>
>>> ARG rocm_ver=1.6.2
>>> RUN wget -qO- repo.radeon.com/rocm/archive/apt_${rocm_ver}.tar.bz2
>>> <http://repo.radeon.com/rocm/archive/apt_$%7Brocm_ver%7D.tar.bz2> \
>>>     | tar -xjv \
>>>     && cd apt_${rocm_ver}/pool/main/ \
>>>     && dpkg -i h/hsakmt-roct-dev/* \
>>>     && dpkg -i h/hsa-ext-rocr-dev/* \
>>>     && dpkg -i h/hsa-rocr-dev/* \
>>>     && dpkg -i r/rocm-utils/* \
>>>     && dpkg -i h/hcc/* \
>>>     && dpkg -i h/hip_base/* \
>>>     && dpkg -i h/hip_hcc/* \
>>>     && dpkg -i h/hip_samples/*
>>>
>>>
>>> I ran a benchmark that prints that it entered main and returns
>>> immediately, this took 9 minutes.
>>> I've attached a debug trace with debug flags =
>>> "GPUDriver,SyscallVerbose"
>>> There's a lot of weird things going on, "syscall open: failed", "syscall
>>> brk: break point changed to [...]", and lots of ignored system calls.
>>>
>>> head of Stats for reference:
>>> ---------- Begin Simulation Statistics ----------
>>> sim_seconds                                  0.096192
>>>     # Number of seconds simulated
>>> sim_ticks                                 96192368500
>>>     # Number of ticks simulated
>>> final_tick                                96192368500
>>>     # Number of ticks from beginning of simulation (restored from
>>> checkpoints and never reset)
>>> sim_freq                                 1000000000000
>>>     # Frequency of simulated ticks
>>> host_inst_rate                                 175209
>>>     # Simulator instruction rate (inst/s)
>>> host_op_rate                                   338409
>>>     # Simulator op (including micro ops) rate (op/s)
>>> host_tick_rate                              175362515
>>>     # Simulator tick rate (ticks/s)
>>> host_mem_usage                                1628608
>>>     # Number of bytes of host memory used
>>> host_seconds                                   548.53
>>>     # Real time elapsed on the host
>>> sim_insts                                    96108256
>>>     # Number of instructions simulated
>>> sim_ops                                     185628785
>>>     # Number of ops (including micro ops) simulated
>>> system.voltage_domain.voltage                       1
>>>     # Voltage in Volts
>>> system.clk_domain.clock                          1000
>>>     # Clock period in ticks
>>>
>>> Maybe something in the attached file explains it better than I can
>>> express.
>>>
>>> Many thanks for your help and hard work!
>>>
>>> Dan
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jun 11, 2020 at 3:32 AM Kyle Roarty <kroa...@wisc.edu> wrote:
>>>
>>>> Running through a few applications, it took me about 2.5 minutes or
>>>> less each time using docker to start executing the program on an r5 3600.
>>>>
>>>> I ran square, dynamic_shared, and MatrixTranspose (All from HIP) which
>>>> took about 1-1.5 mins.
>>>>
>>>> I ran conv_bench and rnn_bench from DeepBench which took just about 2
>>>> minutes.
>>>>
>>>> Because of that, it's possible the size of the app has an effect on
>>>> setup time, as the HIP apps are extremely small.
>>>>
>>>> Also, the commit Dan is checked out on is d0945dc
>>>> <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639>
>>>>  mem-ruby:
>>>> add cache hit/miss statistics for TCP and TCC
>>>> <https://gem5.googlesource.com/amd/gem5/+/d0945dc285cf146de160808d7e6d4c1fd3f73639>,
>>>> which isn't the most recent commit. I don't believe that that would account
>>>> for such a large slowdown, but it doesn't hurt to try the newest commit
>>>> unless it breaks something.
>>>>
>>>> Kyle
>>>> ------------------------------
>>>> *From:* mattdsincl...@gmail.com <mattdsincl...@gmail.com>
>>>> *Sent:* Thursday, June 11, 2020 1:15 AM
>>>> *To:* gem5 users mailing list <gem5-users@gem5.org>
>>>> *Cc:* Daniel Gerzhoy <daniel.gerz...@gmail.com>; GAURAV JAIN <
>>>> gja...@wisc.edu>; Kyle Roarty <kroa...@wisc.edu>
>>>> *Subject:* Re: [gem5-users] GCN3 GPU Simulation Start-Up Time
>>>>
>>>> Gaurav & Kyle, do you know if this is the case?
>>>>
>>>> Dan, I believe the short answer is yes although 7-8 minutes seems a
>>>> little long.  Are you running this in Kyle's Docker, or separately?  If in
>>>> the Docker, that does increase the overhead somewhat, so running it
>>>> directly on a system would likely reduce the overhead somewhat.  Also, are
>>>> you running with the release or debug version of the ROCm drivers?  Again,
>>>> debug version will likely add some time to this.
>>>>
>>>> Matt
>>>>
>>>> On Wed, Jun 10, 2020 at 2:00 PM Daniel Gerzhoy via gem5-users <
>>>> gem5-users@gem5.org> wrote:
>>>>
>>>> I've been running simulations using the GCN3 branch:
>>>>
>>>> rocm_ver=1.6.2
>>>> $git branch
>>>>    * (HEAD detached at d0945dc)
>>>>       agutierr/master-gcn3-staging
>>>>
>>>> And I've noticed that it takes roughly 7-8 minutes to get to main()
>>>>
>>>> I'm guessing that this is the simulator setting up drivers?
>>>> Is that correct? Is there other stuff going on?
>>>>
>>>> *Has anyone found a way to speed this up? *
>>>>
>>>> I am trying to get some of the rodinia benchmarks from the HIP-Examples
>>>> running and debugging takes a long time as a result.
>>>>
>>>> I suspect that this is unavoidable but I won't know if I don't ask!
>>>>
>>>> Cheers,
>>>>
>>>> Dan Gerzhoy
>>>> _______________________________________________
>>>> gem5-users mailing list -- gem5-users@gem5.org
>>>> To unsubscribe send an email to gem5-users-le...@gem5.org
>>>> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
>>>>
>>>>
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to