The issue with the rocblas build is that we don't install rocm-cmake until after we install rocblas. It looks like rocblas downloads it automatically if we don't have it installed, so that's why I didn't get any issues when initially testing it. This patch<https://gem5-review.googlesource.com/c/public/gem5/+/50847> should fix the issue.
Also, I think the documentation Imad was using was the documentation we have in util/dockerfiles/gcn-gpu. I'm of the opinion that we should just remove that README because we have better documentation on gem5.org and in gem5-resources (Although we still say to build gfx8-apu in the gem5.org documentation) Kyle ________________________________ From: mattdsinclair.w...@gmail.com <mattdsinclair.w...@gmail.com> Sent: Wednesday, September 22, 2021 1:11 PM To: gem5 users mailing list <gem5-users@gem5.org> Cc: Poremba, Matthew <matthew.pore...@amd.com>; Kyle Roarty <kroa...@wisc.edu>; Imad Al Assir <imad.al.as...@upc.edu>; Bobby Bruce <bbr...@ucdavis.edu> Subject: Re: [gem5-users] Re: gem5 GCN GPU docker error Collating responses to emails since you all type faster than me - Imad: glad to hear things work with the updates Matt P proposed! - documentation: Matt P, yes we did update the documentation here: https://resources.gem5.org/ (e.g., https://resources.gem5.org/resources/square), but apparently didn't propagate those updates to the webpage Imad was using. I will add that to my list for the week. Bobby, I see you did part of this already. I believe there is more that needs to be cleaned up based on what Imad/Matt P said, but I will wait until your version is checked in (imminently) before re-reading and updating. - apt repos: Matt P, you must be right about rocblas updating something. Kyle, can you please take care of updating the docker to use the specific rocblas version we need? Matt On Wed, Sep 22, 2021 at 1:03 PM Bobby Bruce via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote: Just jumping in here, I can confirm I can't build the image anymore. I had assumed this was just a problem on my end before reading these emails. However, the image hosted at http://gcr.io/gem5-test/gcn-gpu should be the most up-to-date version of this Docker prior to this build error being introduced. It should work. I've updated the website script here: https://gem5-review.googlesource.com/c/public/gem5-website/+/50807. Apologies, our documentation could definitely do with some tidying up :). -- Dr. Bobby R. Bruce Room 3050, Kemper Hall, UC Davis Davis, CA, 95616 web: https://www.bobbybruce.net On Wed, Sep 22, 2021 at 10:02 AM Imad Al Assir via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote: Dear Matt, Many thanks for catching this error! It did indeed solve the problem; I was able to successfully run square and other applications from hip-samples on both, the manually built dockerfile with everything related to rocBLAS and MIOpen commented, and the pre-built docker image which I believe has rocBLAS and MIOpen installed (based on its size). Many thanks again, Imad On Sep 22 2021, at 6:48 pm, Poremba, Matthew <matthew.pore...@amd.com<mailto:matthew.pore...@amd.com>> wrote: [AMD Official Use Only] Hi Imad, Yes, the docker seems to have broken in the past few days. Regarding the benchmark not completing, please change your command to use 3 CPUs: docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources \ -w /gem5 gcr.io/gem5-test/gcn-gpu<http://gcr.io/gem5-test/gcn-gpu> \ build/GCN3_X86/gem5.opt configs/example/apu_se.py -n3 \ --benchmark-root=/gem5-resources/src/gpu/square/bin \ -c square ROCm 4.0 requires 3 CPUs to run now. I thought we had updated the README.md and website before gem5 21.1 release to reflect this but looks like they are not up to date. -Matt From: Imad Al Assir via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> Sent: Wednesday, September 22, 2021 9:31 AM To: Matt Sinclair <sincl...@cs.wisc.edu<mailto:sincl...@cs.wisc.edu>> Cc: gem5 users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>>; Kyle Roarty <kroa...@wisc.edu<mailto:kroa...@wisc.edu>>; Imad Al Assir <imad.al.as...@upc.edu<mailto:imad.al.as...@upc.edu>> Subject: [gem5-users] Re: gem5 GCN GPU docker error [CAUTION: External Email] Hello, Thank you for your reply. I was simply following the documentation on the gem5 website: https://www.gem5.org/documentation/general_docs/gpu_models/GCN3<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.gem5.org%2Fdocumentation%2Fgeneral_docs%2Fgpu_models%2FGCN3&data=04%7C01%7Cmatthew.poremba%40amd.com%7C2675554a18524cefdd0008d97de67d9b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679251172742925%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=izNVhdZSvEH7gisG849pkXAdKu2MtDMOt3aBbn9J26o%3D&reserved=0> In other words, to build the image, I used: docker build -t gcn-gpu . This command didn't complete and was interrupted by the error I pasted in the previous mail. I was also using the command in the documentation to compile square: docker run --rm -v $PWD/gem5-resources:$PWD/gem5-resources -w $PWD/gem5-resources/src/gpu/square gcr.io/gem5-test/gcn-gpu<http://gcr.io/gem5-test/gcn-gpu> make square NOT "make gfx8-apu", as written in the documentation, which caused an error: "no rule to make target 'gfx8-apu' ", and I assumed was a typo. To run it, I also used the command in the doc: docker run --rm -v $PWD/gem5:/gem5 -v $PWD/gem5-resources:/gem5-resources \ -w /gem5 gcr.io/gem5-test/gcn-gpu<http://gcr.io/gem5-test/gcn-gpu> \ build/GCN3_X86/gem5.opt configs/example/apu_se.py -n2 \ --benchmark-root=/gem5-resources/src/gpu/square/bin \ -c square Note that in these commands, I modified the path of square to 'gem5-resources/src/gpu/square' instead of 'gem5-resources/src/square', because that's where I found the code for it. Also note that I tried downloading the pre-built binary of square (from the gem5-resources website: http://resources.gem5.org/README<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fresources.gem5.org%2FREADME&data=04%7C01%7Cmatthew.poremba%40amd.com%7C2675554a18524cefdd0008d97de67d9b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679251172752910%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aoZN7pZU%2Be9m0dvaemraGLb0MEulGMRH%2FVExbRdyllI%3D&reserved=0>), but the result was the same: application running indefinitely. Thanks again for your help, Imad PS: If it helps, here are the last things printed when running square in gem5 in the pre-built docker image: [...] just warnings gem5 Simulator System. http://gem5.org gem5 is copyrighted software; use the --copyright option for details. gem5 version 21.1.0.1 gem5 compiled Sep 21 2021 14:52:55 gem5 started Sep 22 2021 15:26:26 gem5 executing on 8d532399b09e, pid 1 command line: build/GCN3_X86/gem5.opt configs/example/apu_se.py -n2 --benchmark-root=/gem5-resources/src/gpu/square/bin -c square info: Standard input is not a terminal, disabling listeners. Num SQC = 1 Num scalar caches = 1 Num CU = 4 coalescer.slave is deprecated. `slave` is now called `in_ports` warn: coalescer.slave is deprecated. `slave` is now called `in_ports` warn: coalescer.slave is deprecated. `slave` is now called `in_ports` [...] same warning as the one right above this line, repeated multiple times warn: system.ruby.network adopting orphan SimObject param 'ext_links' warn: system.ruby.network adopting orphan SimObject param 'int_links' build/GCN3_X86/sim/simulate.cc:107: info: Entering event queue @ 0. Starting simulation... build/GCN3_X86/mem/ruby/system/Sequencer.cc:573: warn: Replacement policy updates recently became the responsibility of SLICC state machines. Make sure to setMRU() near callbacks in .sm files! build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall access(...) build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...) [...] same warning as above repeated multiple times build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall set_robust_list(...) build/GCN3_X86/sim/syscall_emul.cc:84: warn: ignoring syscall rt_sigaction(...) (further warnings will be suppressed) build/GCN3_X86/sim/syscall_emul.cc:84: warn: ignoring syscall rt_sigprocmask(...) (further warnings will be suppressed) build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall get_mempolicy(...) build/GCN3_X86/arch/generic/debugfaults.hh:144: warn: MOVNTDQ: Ignoring non-temporal hint, modeling as cacheable! build/GCN3_X86/arch/x86/generated/exec-ns.cc.inc:27: warn: instruction 'frndint' unimplemented build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:699: warn: unimplemented ioctl: AMDKFD_IOC_ACQUIRE_VM build/GCN3_X86/sim/syscall_emul.hh:1676: warn: mmap: writing to shared mmap region is currently unsupported. The write succeeds on the target, but it will not be propagated to the host or shared mappings build/GCN3_X86/sim/mem_state.cc:443: info: Increasing stack size by one page. build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:450: warn: Signal events are only supported currently build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/power_state.cc:105: warn: PowerState: Already in the requested power state, request ignored build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall set_robust_list(...) build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...) build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:594: warn: unimplemented ioctl: AMDKFD_IOC_SET_SCRATCH_BACKING_VA build/GCN3_X86/gpu-compute/gpu_compute_driver.cc:604: warn: unimplemented ioctl: AMDKFD_IOC_SET_TRAP_HANDLER info: running on device info: architecture on AMD GPU device is: 801 info: allocate host and device mem ( 7.63 MB) info: launch 'vector_square' kernel build/GCN3_X86/sim/syscall_emul.cc:84: warn: ignoring syscall sched_yield(...) (further warnings will be suppressed) build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...) build/GCN3_X86/sim/syscall_emul.cc:73: warn: ignoring syscall mprotect(...) On Sep 22 2021, at 5:17 pm, Matt Sinclair <sincl...@cs.wisc.edu<mailto:sincl...@cs.wisc.edu>> wrote: Hi Imad, I just built the docker earlier this week and did not have any problems (e.g., I ran square and it completed in < 2 hours). How are you trying to build it? And how are you running the applications you mentioned? Thanks, Matt On Wed, Sep 22, 2021 at 12:31 AM Imad Al Assir via gem5-users <gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote: Hello, Is there a problem with the most recent gcn-gpu docker file? I tried building it several times on Ubuntu 20.04 and 18.04 but it kept giving me this error: [...] Unpacking rocblas (2.32.0-cc18d25f) ... dpkg: dependency problems prevent configuration of rocblas: rocblas depends on rocm-core; however: Package rocm-core is not installed. dpkg: error processing package rocblas (--install): dependency problems - leaving unconfigured dpkg: dependency problems prevent configuration of rocblas-dev: rocblas-dev depends on rocblas (>= 2.32.0); however: Package rocblas is not configured yet. dpkg: error processing package rocblas-dev (--install): dependency problems - leaving unconfigured Errors were encountered while processing: rocblas rocblas-dev + check_exit_code 1 + (( 1 != 0 )) + exit 1 The command '/bin/sh -c ./install.sh -d -a all -i' returned a non-zero code: 1 I also tried downloading the pre-built docker image (gcr.io/gem5-test/gcn-gpu<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgcr.io%2Fgem5-test%2Fgcn-gpu&data=04%7C01%7Cmatthew.poremba%40amd.com%7C2675554a18524cefdd0008d97de67d9b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679251172752910%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=y4gP%2BilM5v7tnvFpeOmXkXfgTdeI0PryYxQg3FCwsu0%3D&reserved=0>) and built gem5 supposedly with no errors (but with a warning about deprecated namespaces not being supported by the compiler). Then when I tried running the 'square' sample application and other ones from gem5-resources/src/gpu/hip-samples (e.g. MatrixTranspose, dynamic_shared, inline_asm, etc.), they just kept running indefinitely (> 2 hours), and I had to kill them to stop them. May you please try building the latest version of the gcn-gpu dockerfile and/or running a sample application on the pre-built docker image, and inform us if it works, and if not, how to fix the problem? Thanks in advance, Imad Al Assir _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-le...@gem5.org<mailto:gem5-users-le...@gem5.org> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-le...@gem5.org<mailto:gem5-users-le...@gem5.org> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s _______________________________________________ gem5-users mailing list -- gem5-users@gem5.org<mailto:gem5-users@gem5.org> To unsubscribe send an email to gem5-users-le...@gem5.org<mailto:gem5-users-le...@gem5.org> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
_______________________________________________ gem5-users mailing list -- gem5-users@gem5.org To unsubscribe send an email to gem5-users-le...@gem5.org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s