Hello again.
I forgot to note before that I get warnings after the second restart, before 
the segmentation fault:
[59000] WARNING at fileconnection.cpp:355 in refill; REASON='JWARNING(false) 
failed'
Message: Size of file smaller than what we expected
Appreciate any help or work you do to fix this issue!

-- Nate TeBlunthuis

PhD Candidate,
Department of Communication,
Community Data Science Collective
University of Washington
https://teblunthuis.cc

________________________________
From: Nate E TeBlunthuis
Sent: Monday, March 23, 2020 1:12 PM
To: dmtcp-forum@lists.sourceforge.net <dmtcp-forum@lists.sourceforge.net>
Subject: Segmentation faults with R.

Greetings,
I am fitting models using the rstanarm package (which is part of the mc-stan 
system for statistical modeling). I'm trying to checkpoint my models using 
dmtcp under a slurm scheduler.  I tested checkpointing fitting toy models with 
dmtcp and it seemed to work just fine. I can checkpoint and resume multiple 
times and get a valid model in the end.

But when I try to fit larger models that use around 24G of memory, I have 
problems with multiple checkpoints and resumes. Strangely, I can successfully 
checkpoint and resume once, but after resuming from the second checkpoint, a 
subsequent attempt to checkpoint fails with a segmentation fault.

I am using dmtcp 3.0 installed by the managers of my cluster. I have tried 
using R 3.5.2 compiled with gcc as well as R 3.6.0 compiled with icc.

I'm running dmtcp_launch -p 2020 --rm --no-gzip --checkpoint-open-files 
--allow-file-overwrite $my_command

I also tried this with and without the --disable-dl-plugin flag.

Since dmtcp works fine with toy models that don't use much ram, I wonder if 
address space randomization could be a factor.
I'm more than happy to provide more information if it can help.
-- Nate TeBlunthuis

PhD Candidate,
Department of Communication,
Community Data Science Collective
University of Washington
https://teblunthuis.cc


_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to