2015-10-13 15:44 GMT+02:00 Adi Kriegisch <a...@kriegisch.at>: > Package: ctdb > Version: 2.5.4+debian0-4 > > Dear maintainers,
Hello Adi, Sorry for my late reply. > I recently upgraded a samba cluster from Wheezy (with Kernel, ctdb, samba > and glusterfs from backports) to Jessie. The cluster itself is way older > and basically always worked. Since the upgrade to Jessie 'smbstatus -b' > (almost always) just hangs the whole cluster; I need to interrupt the call > with ctrl+c (or run with 'timeout 2') to avoid a complete cluster lockup > leading to the other cluster nodes being banned and the node I run smbstatus > on to have ctdbd run at 100% load but not being able to recover. How do you recover then? KILL-ing ctdbd? > The cluster itself consists of three nodes sharing three cluster ips. The > only service ctdb manages is Samba. The lock file is located on a mirrored > glusterfs volume. > > running and interrupting the hanging smbstatus leads to the following log > messages in /var/log/ctdb/log.ctdb: > | 2015/10/13 15:09:24.923002 [19378]: Starting traverse on DB > | smbXsrv_session_global.tdb (id 2592646) > | 2015/10/13 15:09:25.505302 [19378]: server/ctdb_traverse.c:644 Traverse > | cancelled by client disconnect for database:0x6b06a26d > | 2015/10/13 15:09:25.505492 [19378]: Could not find idr:2592646 > | [...] > | 2015/10/13 15:09:25.507553 [19378]: Could not find idr:2592646 > > 'ctdb getdbmap' lists that database, but also lists a second entry for > smbXsrv_session_global.tdb: > | dbid:0x521b7544 name:smbXsrv_version_global.tdb > path:/var/lib/ctdb/smbXsrv_version_global.tdb.0 > | dbid:0x6b06a26d name:smbXsrv_session_global.tdb > path:/var/lib/ctdb/smbXsrv_session_global.tdb.0 > (I have no idea if that has always been the case or if that happened after > the upgrade). > > Calling 'smbstatus --locks' and 'smbstatus --shares' works just fine. Have you tried which of --processes, --notify hangs? Does it hangs with "-b --fast"? , > 'strace'ing ctdbd leads to a massive amount of these messages: > | > write(58,"\240\4\0\0BDTC\1\0\0\0\215U\336\25\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > | 1184) = -1 EAGAIN (Resource temporarily > unavailable) fd 58 is probably the ctdb socket. Can you confirm? To have more usefull info, can you install gdb, ctdb-dbg and samba-dbg and send the stacktrace of ctdbd at the write? > Running 'ctdb_diagnostics' is only possible shortly after the cluster is > started (ie. while smbstatus -b works) and yields the following messages: > | ERROR[1]: /etc/krb5.conf is missing on node 0 > | ERROR[2]: File /etc/hosts is different on node 1 > | ERROR[3]: File /etc/hosts is different on node 2 > | ERROR[4]: File /etc/samba/smb.conf is different on node 1 > | ERROR[5]: File /etc/samba/smb.conf is different on node 2 > | ERROR[6]: File /etc/fstab is different on node 1 > | ERROR[7]: File /etc/fstab is different on node 2 > | ERROR[8]: /etc/multipath.conf is missing on node 0 > | ERROR[9]: /etc/pam.d/system-auth is missing on node 0 > | ERROR[10]: /etc/default/nfs is missing on node 0 > | ERROR[11]: /etc/exports is missing on node 0 > | ERROR[12]: /etc/vsftpd/vsftpd.conf is missing on node 0 > | ERROR[13]: Optional file /etc/ctdb/static-routes is not present on node 0 > '/etc/hosts' differs in some newlines and comments while 'smb.conf' only > has some different log levels on the nodes. The rest of the messages does > not affect ctdb as it only manages samba. Yes. Nothing relevant here. > Feel free to ask if you need any more information. Regards -- Mathieu