Try separating the MPI process and threads. For example if you have 4 GPUs, use 0,1:2,3. You can do this in ‘Which GPUs to use’ under compute tab.
Additionally, use a scratch directory and do not put particles in RAM. You can skip padding and gridding as well. Hope this helps! Suparno. Sent from my iPhone > On Dec 22, 2023, at 7:23 PM, Srivastava, Dhiraj <dhiraj-srivast...@uiowa.edu> > wrote: > > > It was reported by me but the problem was not solved. I thought ccp4bb has > much bigger user base and may be someone has experienced this issue. > It is something related to os (rocky linux) or mpi in my computer. Data set > is not a problem as i can process same data set with exactly same parameters > on a much less powerful computer without any problem. Also 2D classification > on my computer is using gpu without any problem. Its only mpi processes of > relion which are failing. Cryosparc is not a problem either. > Thanks > Dhiraj > From: Takanori Nakane <tnakane.prot...@osaka-u.ac.jp> > Sent: Friday, December 22, 2023 5:35 PM > To: Srivastava, Dhiraj <dhiraj-srivast...@uiowa.edu> > Cc: CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK> > Subject: [External] Re: [ccp4bb] Relion issue with MPI > > Hi, > > First of all, please report details of your hardware and your job. > > - Type of GPU > - Number of GPU > - GPU memory size > - Box size > - Number of threads > - Number of MPI processes > - Full command line > > Do you get the same error in ALL datasets (including our > tutorial dataset) or only on this particular dataset? > > A very similar issue was reported in > https://github.com/3dem/relion/issues/1056 > but I do not know what is the cause at the moment. > > Best regards, > > Takanori Nakane > > On 12/23/23 03:31, Srivastava, Dhiraj wrote: > > Hi > > I am trying to use relion and I am getting error when trying to use mpi > > (for 3d classification and 3D auto-refine). > > > > > > ERROR: out of memory in > > /home/lvantol/relion5/relion/src/acc/cuda/custom_allocator.cuh at line > > 436 (error-code 2) > > > > in: /home/lvantol/relion5/relion/src/acc/cuda/cuda_settings.h, line 65 > > > > ERROR: > > > > A GPU-function failed to execute. > > > > > > 2D classification is working fine with significant GPU usage. I tried 3 > > different versions (4, 4 beta and 5 beta), one installed by vendor > > (Exxact) and all have the same issue. I am able to do 3D auto-refine > > and 3D classification on the same data set using our cluster without any > > problem. did anyone encounter a similar issue earlier? How can I fix > > this problem? > > > > > > Thank you > > > > Dhiraj > > > > > > > > ------------------------------------------------------------------------ > > > > To unsubscribe from the CCP4BB list, click the following link: > > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 > > <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1> > > > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 ######################################################################## To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/