Hi Marc,

I've seen systemd be overly helpful (read: not at all helpful) when it observes state changing outside of its control. There was a bug I encountered with GPFS (although the real issue may have been systemd, but the fix was put into GPFS) by which GPFS filesystems would get unmounted a split second after they were mounted, by systemd. The fs would mount but systemd decided the /dev/$fs device wasn't "ready" so it helpfully unmounted the filesystem. I don't know much about systemd (avoiding it) but based on my experience with it I could certainly see a case where systemd may actively kill the sdrserv process shortly after it's started by the mm* commands if systemd doesn't expect it to be running.

I'd be curious to see the output of /var/adm/ras/mmsdrserv.log from the manager nodes to see if sdrserv is indeed starting but getting harpooned by systemd.

-Aaron

On 7/28/16 4:16 PM, Marc A Kaplan wrote:
Allow me to restate and demonstrate:

Even if systemd or any explicit kill signals destroy any/all running
mmcr* and mmsdr* processes,

simply running mmlsconfig will fire up new mmcr* and mmsdr* processes.
 For example:

## I used kill -9 to kill all mmccr, mmsdr, lxtrace, ... processes

[root@n2 gpfs-git]# ps auwx | grep mm
root      9891  0.0  0.0 112640   980 pts/1    S+   12:57   0:00 grep
--color=auto mm

[root@n2 gpfs-git]# mmlsconfig
Configuration data for cluster madagascar.frozen:
-------------------------------------------------
clusterName madagascar.frozen
   ...
worker1Threads 1022
adminMode central

File systems in cluster madagascar.frozen:
------------------------------------------
/dev/mak
/dev/x1
/dev/yy
/dev/zz

## mmlsconfig "needs" ccr and sdrserv, so if it doesn't see them, it
restarts them!

[root@n2 gpfs-git]# ps auwx | grep mm
root      9929  0.0  0.0 114376  1696 pts/1    S    12:58   0:00
/usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root     10110  0.0  0.0  20536   128 ?        Ss   12:58   0:00
/usr/lpp/mmfs/bin/lxtrace-3.10.0-123.el7.x86_64 on /tmp/mmfs/lxtrac
root     10125  0.0  0.0 493264 11064 ?        Ssl  12:58   0:00
/usr/lpp/mmfs/bin/mmsdrserv 1191 10 10 /var/adm/ras/mmsdrserv.log 1
root     10358  0.0  0.0 1700488 17636 ?       Sl   12:58   0:00 python
/usr/lpp/mmfs/bin/mmsysmon.py
root     10440  0.0  0.0 114376   804 pts/1    S    12:59   0:00
/usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/mmccrmonitor 15
root     10442  0.0  0.0 112640   976 pts/1    S+   12:59   0:00 grep
--color=auto mm


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to