The issue with the rocblas build is that we don't install rocm-cmake until 
after we install rocblas. It looks like rocblas downloads it automatically if 
we don't have it installed, so that's why I didn't get any issues when 
initially testing it. This 
patch<https://gem5-review.googlesource.com/c/public/gem5/+/50847> should fix 
the issue.

Also, I think the documentation Imad was using was the documentation we have in 
util/dockerfiles/gcn-gpu. I'm of the opinion that we should just remove that 
README because we have better documentation on gem5.org and in gem5-resources 
(Although we still say to build gfx8-apu in the gem5.org documentation)

Kyle
________________________________
From: mattdsinclair.w...@gmail.com <mattdsinclair.w...@gmail.com>
Sent: Wednesday, September 22, 2021 1:11 PM
To: gem5 users mailing list <gem5-users@gem5.org>
Cc: Poremba, Matthew <matthew.pore...@amd.com>; Kyle Roarty <kroa...@wisc.edu>; 
Imad Al Assir <imad.al.as...@upc.edu>; Bobby Bruce <bbr...@ucdavis.edu>
Subject: Re: [gem5-users] Re: gem5 GCN GPU docker error

Collating responses to emails since you all type faster than me

- Imad: glad to hear things work with the updates Matt P proposed!
- documentation: Matt P, yes we did update the documentation here: 
https://resources.gem5.org/ (e.g., 
https://resources.gem5.org/resources/square), but apparently didn't propagate 
those updates to the webpage Imad was using.  I will add that to my list for 
the week.  Bobby, I see you did part of this already.  I believe there is more 
that needs to be cleaned up based on what Imad/Matt P said, but I will wait 
until your version is checked in (imminently) before re-reading and updating.
- apt repos: Matt P, you must be right about rocblas updating something.  Kyle, 
can you please take care of updating the docker to use the specific rocblas 
version we need?

Matt

On Wed, Sep 22, 2021 at 1:03 PM Bobby Bruce via gem5-users 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote:
Just jumping in here,

I can confirm I can't build the image anymore. I had assumed this was just a 
problem on my end before reading these emails. However, the image hosted at 
http://gcr.io/gem5-test/gcn-gpu should be the most up-to-date version of this 
Docker prior to this build error being introduced. It should work.

I've updated the website script here: 
https://gem5-review.googlesource.com/c/public/gem5-website/+/50807. Apologies, 
our documentation could definitely do with some tidying up :).

--
Dr. Bobby R. Bruce
Room 3050,
Kemper Hall, UC Davis
Davis,
CA, 95616

web: https://www.bobbybruce.net


On Wed, Sep 22, 2021 at 10:02 AM Imad Al Assir via gem5-users 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote:
Dear Matt,

Many thanks for catching this error! It did indeed solve the problem; I was 
able to successfully run square and other applications from hip-samples on 
both, the manually built dockerfile with everything related to rocBLAS and 
MIOpen commented, and the pre-built docker image which I believe has rocBLAS 
and MIOpen installed (based on its size).

Many thanks again,
Imad

On Sep 22 2021, at 6:48 pm, Poremba, Matthew 
<matthew.pore...@amd.com<mailto:matthew.pore...@amd.com>> wrote:

[AMD Official Use Only]



Hi Imad,





Yes, the docker seems to have broken in the past few days.



Regarding the benchmark not completing, please change your command to use 3 
CPUs:





docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources \

                -w /gem5 
gcr.io/gem5-test/gcn-gpu<http://gcr.io/gem5-test/gcn-gpu> \

                build/GCN3_X86/gem5.opt configs/example/apu_se.py -n3 \

                --benchmark-root=/gem5-resources/src/gpu/square/bin \

                -c square



ROCm 4.0 requires 3 CPUs to run now.  I thought we had updated the README.md 
and website before gem5 21.1 release to reflect this but looks like they are 
not up to date.





-Matt



From: Imad Al Assir via gem5-users 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
Sent: Wednesday, September 22, 2021 9:31 AM
To: Matt Sinclair <sincl...@cs.wisc.edu<mailto:sincl...@cs.wisc.edu>>
Cc: gem5 users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>>; 
Kyle Roarty <kroa...@wisc.edu<mailto:kroa...@wisc.edu>>; Imad Al Assir 
<imad.al.as...@upc.edu<mailto:imad.al.as...@upc.edu>>
Subject: [gem5-users] Re: gem5 GCN GPU docker error


[CAUTION: External Email]

Hello,
Thank you for your reply. I was simply following the documentation on the gem5 
website: 
https://www.gem5.org/documentation/general_docs/gpu_models/GCN3<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.gem5.org%2Fdocumentation%2Fgeneral_docs%2Fgpu_models%2FGCN3&data=04%7C01%7Cmatthew.poremba%40amd.com%7C2675554a18524cefdd0008d97de67d9b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679251172742925%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=izNVhdZSvEH7gisG849pkXAdKu2MtDMOt3aBbn9J26o%3D&reserved=0>
In other words, to build the image, I used:
 docker build -t gcn-gpu .


This command didn't complete and was interrupted by the error I pasted in the 
previous mail.


I was also using the command in the documentation to compile square:
docker run --rm -v $PWD/gem5-resources:$PWD/gem5-resources -w 
$PWD/gem5-resources/src/gpu/square 
gcr.io/gem5-test/gcn-gpu<http://gcr.io/gem5-test/gcn-gpu> make square


NOT "make gfx8-apu", as written in the documentation, which caused an error: 
"no rule to make target 'gfx8-apu' ", and I assumed was a typo.


To run it, I also used the command in the doc:
docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources \
                -w /gem5 
gcr.io/gem5-test/gcn-gpu<http://gcr.io/gem5-test/gcn-gpu> \
                build/GCN3_X86/gem5.opt configs/example/apu_se.py -n2 \
                --benchmark-root=/gem5-resources/src/gpu/square/bin \
                -c square


Note that in these commands, I modified the path of square to 
'gem5-resources/src/gpu/square' instead of 'gem5-resources/src/square', because 
that's where I found the code for it.
Also note that I tried downloading the pre-built binary of square (from the 
gem5-resources website: 
http://resources.gem5.org/README<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fresources.gem5.org%2FREADME&data=04%7C01%7Cmatthew.poremba%40amd.com%7C2675554a18524cefdd0008d97de67d9b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679251172752910%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aoZN7pZU%2Be9m0dvaemraGLb0MEulGMRH%2FVExbRdyllI%3D&reserved=0>),
 but the result was the same: application running indefinitely.


Thanks again for your help,
Imad


PS: If it helps, here are the last things printed when running square in gem5 
in the pre-built docker image:


[...] just warnings


gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.


gem5 version 21.1.0.1
gem5 compiled Sep 21 2021 14:52:55
gem5 started Sep 22 2021 15:26:26
gem5 executing on 8d532399b09e, pid 1
command line: build/GCN3_X86/gem5.opt configs/example/apu_se.py -n2 
--benchmark-root=/gem5-resources/src/gpu/square/bin -c square


info: Standard input is not a terminal, disabling listeners.
Num SQC =  1 Num scalar caches =  1 Num CU =  4
coalescer.slave is deprecated. `slave` is now called `in_ports`
warn: coalescer.slave is deprecated. `slave` is now called `in_ports`
warn: coalescer.slave is deprecated. `slave` is now called `in_ports`


[...] same warning as the one right above this line, repeated multiple times


warn: system.ruby.network adopting orphan SimObject param 'ext_links'
warn: system.ruby.network adopting orphan SimObject param 'int_links'
build/GCN3_X86/sim/simulate.cc:107: info: Entering event queue @ 0.  Starting 
simulation...
build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy 
updates recently became the responsibility of SLICC state machines. Make sure 
to setMRU() near callbacks in .sm files!
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall access(...)
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...)


[...] same warning as above repeated multiple times


build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall 
set_robust_list(...)
build/GCN3_X86/sim/syscall_emul.cc:84: warn: ignoring syscall rt_sigaction(...)
      (further warnings will be suppressed)
build/GCN3_X86/sim/syscall_emul.cc:84: warn: ignoring syscall 
rt_sigprocmask(...)
      (further warnings will be suppressed)
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall get_mempolicy(...)
build/GCN3_X86/arch/generic/debugfaults.hh:144: warn: MOVNTDQ: Ignoring 
non-temporal hint, modeling as cacheable!
build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction 
'frndint' unimplemented
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:699: warn: unimplemented 
ioctl: AMDKFD_IOC_ACQUIRE_VM
build/GCN3_X86/sim/syscall_emul.hh:1676: warn: mmap: writing to shared mmap 
region is currently unsupported. The write succeeds on the target, but it will 
not be propagated to the host or shared mappings
build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page.
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:450: warn: Signal events are 
only supported currently
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/power_state.cc:105: warn: PowerState: Already in the 
requested power state, request ignored
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall 
set_robust_list(...)
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...)
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:594: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA
build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:604: warn: unimplemented 
ioctl: AMDKFD_IOC_SET_TRAP_HANDLER
info: running on device
info: architecture on AMD GPU device is: 801
info: allocate host and device mem (  7.63 MB)
info: launch 'vector_square' kernel
build/GCN3_X86/sim/syscall_emul.cc:84: warn: ignoring syscall sched_yield(...)
      (further warnings will be suppressed)
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...)
build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...)


On Sep 22 2021, at 5:17 pm, Matt Sinclair 
<sincl...@cs.wisc.edu<mailto:sincl...@cs.wisc.edu>> wrote:
Hi Imad,

I just built the docker earlier this week and did not have any problems (e.g., 
I ran square and it completed in < 2 hours).  How are you trying to build it?  
And how are you running the applications you mentioned?

Thanks,
Matt


On Wed, Sep 22, 2021 at 12:31 AM Imad Al Assir via gem5-users 
<gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote:
Hello,
Is there a problem with the most recent gcn-gpu docker file?
I tried building it several times on Ubuntu 20.04 and 18.04 but it kept giving 
me this error:

[...]
Unpacking rocblas (2.32.0-cc18d25f) ...
dpkg: dependency problems prevent configuration of rocblas:
 rocblas depends on rocm-core; however:
  Package rocm-core is not installed.


dpkg: error processing package rocblas (--install):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of rocblas-dev:
 rocblas-dev depends on rocblas (>= 2.32.0); however:
  Package rocblas is not configured yet.


dpkg: error processing package rocblas-dev (--install):
 dependency problems - leaving unconfigured
Errors were encountered while processing:
 rocblas
 rocblas-dev
+ check_exit_code 1
+ ((  1 != 0  ))
+ exit 1
The command '/bin/sh -c ./install.sh -d -a all -i' returned a non-zero code: 1


I also tried downloading the pre-built docker image 
(gcr.io/gem5-test/gcn-gpu<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgcr.io%2Fgem5-test%2Fgcn-gpu&data=04%7C01%7Cmatthew.poremba%40amd.com%7C2675554a18524cefdd0008d97de67d9b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679251172752910%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=y4gP%2BilM5v7tnvFpeOmXkXfgTdeI0PryYxQg3FCwsu0%3D&reserved=0>)
 and built gem5 supposedly with no errors (but with a warning about deprecated 
namespaces not being supported by the compiler). Then when I tried running the 
'square' sample application and other ones from 
gem5-resources/src/gpu/hip-samples (e.g. MatrixTranspose, dynamic_shared, 
inline_asm, etc.), they just kept running indefinitely (> 2 hours), and I had 
to kill them to stop them.


May you please try building the latest version of the gcn-gpu dockerfile and/or 
running a sample application on the pre-built docker image, and inform us if it 
works, and if not, how to fix the problem?


Thanks in advance,
Imad Al Assir
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org>
To unsubscribe send an email to 
gem5-users-le...@gem5.org<mailto:gem5-users-le...@gem5.org>
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org>
To unsubscribe send an email to 
gem5-users-le...@gem5.org<mailto:gem5-users-le...@gem5.org>
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org>
To unsubscribe send an email to 
gem5-users-le...@gem5.org<mailto:gem5-users-le...@gem5.org>
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to