Ah, just tried it on some fresh machines. Looks like the solution that
worked there isn’t making my cluster any happier. Any other thoughts?
(to be clear, looks like that was adding vdsmd-network.service as an
After target, and vdsmd.service as a Before target)
On 4 Dec 2015, at 10:06, Atin Mukherjee wrote:
You might be experiencing this:
https://www.gluster.org/pipermail/gluster-users/2015-November/024292.html
-Atin
Sent from one plus one
On Dec 4, 2015 9:07 PM, "Brian Hicks" <[email protected]> wrote:
Hi all,
I’m running Gluster 3.7.6 on Centos 7.1, and using Consul for DNS
(for
example, putting all the glusterd servers at
glusterfs.service.consul.)
I’m seeing odd behavior when I reboot the nodes running glusterd.
Basically, it doesn’t seem to be able to resolve names at boot. I
have the
default settings as well as using a systemd drop-in file to make sure
that
glusterd starts after DNS is active (nothing complex, just After and
Require for consul and dnsmasq.) I’ve even tried adding an
ExecStartPre
with a bash while loop that runs until dig can resolve the addresses
listed
in the log file below. Nothing seems to help, my
etc-glusterfs-glusterd.vol.log always contains these lines, and
glusterd
fails to start.
Oddly, if I run systemctl start glusterd after the boot process
completes,
it starts just fine. Is there some other network target I need to
include
in my systemd unit file?
[2015-12-02 22:50:17.493630] I [MSGID: 100030]
[glusterfsd.c:2318:main] 0-/usr/sbin/glusterd: Started running
/usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd -p
/var/run/glusterd.pid --log-level INFO)
[2015-12-02 22:50:17.916025] I [MSGID: 106478] [glusterd.c:1350:init]
0-management: Maximum allowed open file descriptors set to 65536
[2015-12-02 22:50:17.916063] I [MSGID: 106479] [glusterd.c:1399:init]
0-management: Using /var/lib/glusterd as working directory
[2015-12-02 22:50:17.980724] E
[rpc-transport.c:292:rpc_transport_load] 0-rpc-transport:
/usr/lib64/glusterfs/3.7.6/rpc-transport/rdma.so: cannot open shared
object file: No such file or directory
[2015-12-02 22:50:17.980743] W
[rpc-transport.c:296:rpc_transport_load] 0-rpc-transport: volume
'rdma.management': transport-type 'rdma' is not valid or not found on
this machine
[2015-12-02 22:50:17.980753] W
[rpcsvc.c:1597:rpcsvc_transport_create] 0-rpc-service: cannot create
listener, initing the transport failed
[2015-12-02 22:50:17.980762] E [MSGID: 106243] [glusterd.c:1623:init]
0-management: creation of 1 listeners failed, continuing with
succeeded transport
[2015-12-02 22:50:18.605503] I [MSGID: 106228]
[glusterd.c:433:glusterd_check_gsync_present] 0-glusterd:
geo-replication module not installed in the system [No such file or
directory]
[2015-12-02 22:50:18.669326] I [MSGID: 106513]
[glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd:
retrieved op-version: 30706
[2015-12-02 22:50:27.786383] I [MSGID: 106498]
[glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo]
0-management: connect returned 0
[2015-12-02 22:50:27.809153] I
[rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2015-12-02 22:50:27.809078] I [MSGID: 106498]
[glusterd-handler.c:3579:glusterd_friend_add_from_peerinfo]
0-management: connect returned 0
[2015-12-02 22:50:37.844756] E [MSGID: 101075]
[common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed
(Name or service not known)
[2015-12-02 22:50:37.844822] E
[name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS
resolution failed on host resching-os-control-02.node.consul
[2015-12-02 22:50:37.845167] I
[rpc-clnt.c:984:rpc_clnt_connection_init] 0-management: setting
frame-timeout to 600
[2015-12-02 22:50:37.845259] I [MSGID: 106004]
[glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management:
Peer <resching-os-control-02.node.consul>
(<9cf99313-dd68-4ac7-acbb-b018cc167ec2>), in state <Peer in Cluster>,
has disconnected from glusterd.
[2015-12-02 22:50:37.845321] E [MSGID: 106155]
[glusterd-utils.c:199:glusterd_unlock] 0-management: Cluster lock not
held!
[2015-12-02 22:50:47.880585] E [MSGID: 101075]
[common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed
(Name or service not known)
[2015-12-02 22:50:47.880675] E
[name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS
resolution failed on host resching-os-control-01.node.consul
[2015-12-02 22:50:47.880870] I [MSGID: 106004]
[glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management:
Peer <resching-os-control-01.node.consul>
(<cc7ced64-e3c2-403d-ae01-59ad3f68d6e6>), in state <Peer in Cluster>,
has disconnected from glusterd.
[2015-12-02 22:50:47.880910] E [MSGID: 106155]
[glusterd-utils.c:199:glusterd_unlock] 0-management: Cluster lock not
held!
[2015-12-02 22:50:51.583949] E [MSGID: 101075]
[common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed
(Name or service not known)
[2015-12-02 22:50:51.584013] E
[name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS
resolution failed on host resching-os-control-02.node.consul
[2015-12-02 22:50:51.584159] I [MSGID: 106004]
[glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management:
Peer <resching-os-control-02.node.consul>
(<9cf99313-dd68-4ac7-acbb-b018cc167ec2>), in state <Peer in Cluster>,
has disconnected from glusterd.
[2015-12-02 22:50:57.917351] E [MSGID: 106408]
[glusterd-peer-utils.c:120:glusterd_peerinfo_find_by_hostname]
0-management: error in getaddrinfo: Name or service not known
[Unknown error -2]
[2015-12-02 22:51:02.605954] E [MSGID: 101075]
[common-utils.c:306:gf_resolve_ip6] 0-resolver: getaddrinfo failed
(Name or service not known)
[2015-12-02 22:51:02.605990] E
[name.c:247:af_inet_client_get_remote_sockaddr] 0-management: DNS
resolution failed on host resching-os-control-01.node.consul
[2015-12-02 22:51:02.606077] I [MSGID: 106004]
[glusterd-handler.c:5065:__glusterd_peer_rpc_notify] 0-management:
Peer <resching-os-control-01.node.consul>
(<cc7ced64-e3c2-403d-ae01-59ad3f68d6e6>), in state <Peer in Cluster>,
has disconnected from glusterd.
[2015-12-02 22:51:07.938471] E [MSGID: 101075]
[common-utils.c:3127:gf_is_local_addr] 0-management: error in
getaddrinfo: Name or service not known
[2015-12-02 22:51:07.938526] E [MSGID: 106187]
[glusterd-store.c:4266:glusterd_resolve_all_bricks] 0-glusterd:
resolve brick failed in restore
[2015-12-02 22:51:07.938559] E [MSGID: 101019]
[xlator.c:428:xlator_init] 0-management: Initialization of volume
'management' failed, review your volfile again
[2015-12-02 22:51:07.938571] E [graph.c:322:glusterfs_graph_init]
0-management: initializing translator failed
[2015-12-02 22:51:07.938579] E [graph.c:661:glusterfs_graph_activate]
0-graph: init failed
[2015-12-02 22:51:07.947613] W [glusterfsd.c:1236:cleanup_and_exit]
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) [0x7fda0f9fc24d]
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x126) [0x7fda0f9fc0f6]
-->/usr/sbin/glusterd(cleanup_and_exit+0x69) [0x7fda0f9fb6d9] ) 0-:
received signum (0), shutting down
Thanks,
Brian Hicks
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users