Try separating the MPI process and threads. For example if you have 4 GPUs, use 
0,1:2,3. You can do this in ‘Which GPUs to use’ under compute tab. 

Additionally, use a scratch directory and do not put particles in RAM. You can 
skip padding and gridding as well. Hope this helps!

Suparno.

Sent from my iPhone

> On Dec 22, 2023, at 7:23 PM, Srivastava, Dhiraj <dhiraj-srivast...@uiowa.edu> 
> wrote:
> 
> 
> It was reported by me but the problem was not solved. I thought ccp4bb has 
> much bigger user base and may be someone has experienced this issue. 
> It is something related to os (rocky linux) or mpi in my computer. Data set 
> is not a problem as i can process same data set with exactly same parameters 
> on a much less powerful computer without any problem. Also 2D classification 
> on my computer is using gpu without any problem. Its only mpi processes of 
> relion which are failing. Cryosparc is not a problem either.
> Thanks
> Dhiraj 
> From: Takanori Nakane <tnakane.prot...@osaka-u.ac.jp>
> Sent: Friday, December 22, 2023 5:35 PM
> To: Srivastava, Dhiraj <dhiraj-srivast...@uiowa.edu>
> Cc: CCP4BB@JISCMAIL.AC.UK <CCP4BB@JISCMAIL.AC.UK>
> Subject: [External] Re: [ccp4bb] Relion issue with MPI
>  
> Hi,
> 
> First of all, please report details of your hardware and your job.
> 
> - Type of GPU
> - Number of GPU
> - GPU memory size
> - Box size
> - Number of threads
> - Number of MPI processes
> - Full command line
> 
> Do you get the same error in ALL datasets (including our
> tutorial dataset) or only on this particular dataset?
> 
> A very similar issue was reported in
> https://github.com/3dem/relion/issues/1056
> but I do not know what is the cause at the moment.
> 
> Best regards,
> 
> Takanori Nakane
> 
> On 12/23/23 03:31, Srivastava, Dhiraj wrote:
> > Hi
> > I am trying to use relion and I am getting error when trying to use mpi 
> > (for 3d classification and 3D auto-refine).
> > 
> > 
> > ERROR: out of memory in 
> > /home/lvantol/relion5/relion/src/acc/cuda/custom_allocator.cuh at line 
> > 436 (error-code 2)
> > 
> > in: /home/lvantol/relion5/relion/src/acc/cuda/cuda_settings.h, line 65
> > 
> > ERROR:
> > 
> > A GPU-function failed to execute.
> > 
> > 
> > 2D classification is working fine with significant GPU usage. I tried 3 
> > different versions (4, 4 beta and 5 beta), one installed by vendor 
> > (Exxact) and all have the same issue.  I am able to do 3D auto-refine 
> > and 3D classification on the same data set using our cluster without any 
> > problem.  did anyone encounter a similar issue earlier? How can I fix 
> > this problem?
> > 
> > 
> > Thank you
> > 
> > Dhiraj
> > 
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > To unsubscribe from the CCP4BB list, click the following link:
> > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1 
> > <https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1>
> > 
> 
> To unsubscribe from the CCP4BB list, click the following link:
> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Reply via email to