Python does not start as root. My ldd output shows exactly the same libraries as yours. And strace output contains no single occurrence of 'ipmi0'.

Instead there seems to be another explanation.

If we look at python's parents and their open files, we can guess who was the one who opened /dev/ipmi0.

$ pstree -p 44409
slurmstepd(44409)─┬─bash(44540)───python(8711)
                  ├─{slurmstepd}(44462)
                  ├─{slurmstepd}(44463)
                  ├─{slurmstepd}(44464)
                  ├─{slurmstepd}(44527)
                  └─{slurmstepd}(44528)

$ ls -lah /proc/8711/fd
<skip>
lrwx------ 1 s9951545 p_ffmk 64 Mar  2 10:47 5 -> /dev/ipmi0
lrwx------ 1 s9951545 p_ffmk 64 Mar  2 10:47 7 -> /dev/ipmi0

$ ls -lah /proc/44540/fd
<skip>
lrwx------ 1 s9951545 p_ffmk 64 Mar  2 10:40 5 -> /dev/ipmi0
lrwx------ 1 s9951545 p_ffmk 64 Mar  2 10:40 7 -> /dev/ipmi0

$ ls -lah /proc/44409/fd
ls: cannot open directory /proc/44409/fd: Permission denied

$ ls -lah /proc/44462/fd
ls: cannot open directory /proc/44462/fd: Permission denied

Unfortunately, I can't see what are the files opened by slurmstep daemons, but it kind of makes sense for these processes to open /dev/ipmi0

Thus the whole picture looks as follows:

slurmstepd opens /dev/ipmi0, then it starts bash process, which inherits open files from the parent. Then I start python in bash, and hence also inherit open file descriptors to /dev/ipmi0.

It seems that on some nodes of our system access to these files was restricted for the user. And this caused the error. Then I changed the nodes, or admins updated system-wide policy during the last month, and error vanished.

I think the fact that slurmstepd does not close files before calling exec looks like a bug.

On 03/01/2016 06:39 PM, Rohan Garg wrote:
Interesting! So your python does talk to the ipmi0 device. I was confused
about why would DMTCP try to open a connection to /dev/ipmi0 on restart.

One thing still puzzles me though: if /dev/ipmi0 didn't have the right
set of permissions earlier, how was python able to open the device at
launch time?  (Perhaps, the python process started as root and dropped
privileges?)

Are you using any IPMI library/module for python? Or it could be that
one of the libraries/modules you are importing or the python interpreter
itself is linked against some IPMI library. That could explain why
your python interpreter opens the device. One way to verify this is
to see the output of the "ldd" command on the libraries/interpreter
executable. Example:


       # The following command lists the libraries the python interpreter
       #  is linked against.
       $ ldd `which python`
         linux-vdso.so.1 (0x00007ffff01a9000)
         libpython2.7.so.1.0 => /usr/lib64/libpython2.7.so.1.0 
(0x00007f27f1dea000)
         libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f27f1bcd000)
         libc.so.6 => /lib64/libc.so.6 (0x00007f27f1827000)
         libdl.so.2 => /lib64/libdl.so.2 (0x00007f27f1623000)
         libutil.so.1 => /lib64/libutil.so.1 (0x00007f27f1420000)
         libm.so.6 => /lib64/libm.so.6 (0x00007f27f1121000)
         /lib64/ld-linux-x86-64.so.2 (0x000055a49e096000

Another thing that you can try is to run your python interpreter under
"strace -f" to trace the system calls it makes, and to verify if it or some
child process opens the IPMI device.

----- Original Message -----
From: "Maksym Planeta" <mplan...@os.inf.tu-dresden.de>
To: "Rohan Garg" <rohg...@ccs.neu.edu>
Cc: "dmtcp-forum" <Dmtcp-forum@lists.sourceforge.net>
Sent: Tuesday, March 1, 2016 3:31:02 AM
Subject: Re: [Dmtcp-forum] Restart does not work: /dev/ipmi0 Permission denied

Hi Rohan,

thank you for the reply. Somehow the issue was solved without my
intervention. When I retry the same thing now, the error with python
vanishes, i. e. I'm able to restart python shell. And the reason for
that is that the access right to /dev/ipmi0 have changed:

$ ls -lah /dev/ipmi0
crw-rw-rw- 1 root root 245, 0 Nov  5 10:48 /dev/ipmi0

But for the sake of completeness I answer your questions, for the case
if it still may be useful.

1. I never run python as root user on this machine, simply because I
have no root access.

2. The dmtcp version is 2.4.4.

3. I think this is some Bull Linux, but system release reports following:
$ cat /etc/system-release
Red Hat Enterprise Linux Server release 6.4 (Santiago)

Open file descriptors before checkpoint:

$ ls -l /proc/$(ps x | grep -e python | grep -v grep | awk '{print $1}')/fd
total 0
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 0 -> /dev/pts/8
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 1 -> /dev/pts/8
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 2 -> /dev/pts/8
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 5 -> /dev/ipmi0
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 7 -> /dev/ipmi0
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 821 -> socket:[34528195]
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 827 -> /dev/pts/8
l-wx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 828 ->
/tmp/dmtcp-s9951545@taurusi5591/jassertlog.4b3242428f3a397f-40000-56d5522b_python
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 831 ->
/tmp/dmtcp-s9951545@taurusi5591/dmtcpSharedArea.4b3242428f3a397f-40000-56d5522b.56d5522b9

After checkpoint:

$ ls -l /proc/$(ps x | grep -e python | grep -v grep | awk '{print $1}')/fd
total 0
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 0 -> /dev/pts/8
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 1 -> /dev/pts/8
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 2 -> /dev/pts/8
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 5 -> /dev/ipmi0
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 7 -> /dev/ipmi0
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 821 -> socket:[34528195]
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 827 -> /dev/pts/8
l-wx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 828 ->
/tmp/dmtcp-s9951545@taurusi5591/jassertlog.4b3242428f3a397f-40000-56d5522b_python
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 831 ->
/tmp/dmtcp-s9951545@taurusi5591/dmtcpSharedArea.4b3242428f3a397f-40000-56d5522b.56d5522b9

And after restart:

$ ls -l /proc/$(ps x | grep -e python | grep -v grep | awk '{print $1}')/fd
total 0
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 0 -> /dev/pts/6
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 1 -> /dev/pts/6
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 2 -> /dev/pts/6
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 5 -> /dev/ipmi0
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 7 -> /dev/ipmi0
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 821 -> socket:[34528397]
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 827 -> /dev/pts/6
lrwx------ 1 s9951545 p_ffmk 64 Mar  1 09:26 831 ->
/tmp/dmtcp-s9951545@taurusi5591/dmtcpSharedArea.4b3242428f3a397f-40000-56d5522b.56d552486



On 02/29/2016 06:13 PM, Rohan Garg wrote:
Hi Maksym,

This looks like a strange issue. I have some questions about your setup.

   - Do you launch your python interpreter with sudo privileges or as the
     root user?
   - What python version are you using? What DMTCP version are you using?
   - What distro are you using?

At restart time, DMTCP tries to restore file connections that the
process had opened at checkpoint time. I'm not sure why it's trying
to open '/dev/ipmi0' on restart.  Can you share the output of the
following command: ls -l /proc/<PID>/fd prior to checkpointing?
(Here PID is the process id of the python interpreter that you launch
under DMTCP.) This will help us identify if for some strange reason
the python interpreter opens /dev/ipmi0 on your setup.

Thanks,
Rohan

On Feb 1, 2016, at 3:39 AM, Maksym Planeta <mplan...@os.inf.tu-dresden.de> 
wrote:

Hello,

I'm trying to setup DMTCP. I installed it and launch coordinator. Then I launched python 
interpreter, created a variable, switched to coordinator, initiated checkpoint, and 
killed all coordinator clients with "k" command.

After this python interpreter was terminated and several new files appeared in 
the directory where coordinator was running.

Next I wanted to restart the interpreter. I still had my coordinator open, so I 
decided to use dmtcp_restart to launch python again:

dmtcp_restart ckpt_*.dmtcp

But this resulted in following error report:

[40000] ERROR at fileconnection.cpp:863 in openFile; REASON='JASSERT(fd != -1) 
failed'
      _path = /dev/ipmi0
      (strerror((*__errno_location ()))) = Permission denied

I have this file:

$ ls /dev/ipmi0  -lah
crw-rw---- 1 root root 245, 0 Nov  4 10:22 /dev/ipmi0

But I don't have root permissions to manipulate access rights over the file. 
Could you tell me what can I do about this? And why DMTCP tries to access a 
file which the interpreter was never allowed to access?

--
Regards,
Maksym Planeta

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum



--
Regards,
Maksym Planeta

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Dmtcp-forum mailing list
Dmtcp-forum@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dmtcp-forum

Reply via email to