On 03/31/2015 10:47 PM, Rumen Telbizov wrote:
Pranith and Atin,
Thank you for looking into this and confirming it's a bug. Please log
the bug yourself since I am not familiar with the project's
bug-tracking system.
Assessing its severity and the fact that this effectively stops the
cluster from functioning properly after boot, what do you think would
be the timeline for fixing this issue? What version do you expect to
see this fixed in?
In the meantime, is there another workaround that you might suggest
besides running a secondary mount later after the boot is over?
Adding glusterd maintainers to the thread: +kaushal, +krishnan
I will let them answer your questions.
Pranith
Thank you again for your help,
Rumen Telbizov
On Tue, Mar 31, 2015 at 2:53 AM, Pranith Kumar Karampuri
<[email protected] <mailto:[email protected]>> wrote:
On 03/31/2015 01:55 PM, Atin Mukherjee wrote:
On 03/31/2015 01:03 PM, Pranith Kumar Karampuri wrote:
On 03/31/2015 12:53 PM, Atin Mukherjee wrote:
On 03/31/2015 12:27 PM, Pranith Kumar Karampuri wrote:
Atin,
Could it be because bricks are started
with PROC_START_NO_WAIT?
That's the correct analysis Pranith. Mount was
attempted before the
bricks were started. If we can have a time lag in some
seconds between
mount and volume start the problem will go away.
Atin,
I think one way to solve this issue is to start
the bricks with
NO_WAIT so that we can handle pmap-signin but wait for the
pmap-signins
to complete before responding to cli/completing 'init'?
Logically it should solve the problem. We need to think around
it more
from the existing design perspective.
Rumen,
Feel free to log a bug. This should be fixed in later
release. We can raise the bug and work it as well if you prefer it
this way.
Pranith
~Atin
Pranith
Pranith
On 03/31/2015 04:41 AM, Rumen Telbizov wrote:
Hello everyone,
I have a problem that I am trying to resolve
and not sure which way to
go so here I am asking for your advise.
What it comes down to is that upon initial
boot of all my GlusterFS
machines the shared volume doesn't get
mounted. Nevertheless the
volume successfully created and started and
further attempts to mount
it manually succeed. I suspect what's
happening is that gluster
processes/bricks/etc haven't fully started at
the time the /etc/fstab
entry is read and the initial mount attempt is
being made. Again, by
the time I log in and run a mount -a -- the
volume mounts without any
issues.
_Details from the logs:_
[2015-03-30 22:29:04.381918] I [MSGID: 100030]
[glusterfsd.c:2018:main]
0-/usr/sbin/glusterfs: Started running
/usr/sbin/glusterfs version 3.6.2 (args:
/usr/sbin/glusterfs
--log-file=/var/log/glusterfs/glusterfs.log
--attribute-timeout=0
--entry-timeout=0 --volfile-server=localhost
--volfile-server=10.12.130.21
--volfile-server=10.12.130.22
--volfile-server=10.12.130.23
--volfile-id=/myvolume /opt/shared)
[2015-03-30 22:29:04.394913] E
[socket.c:2267:socket_connect_finish]
0-glusterfs: connection to 127.0.0.1:24007
<http://127.0.0.1:24007> <http://127.0.0.1:24007>
failed (Connection refused)
[2015-03-30 22:29:04.394950] E
[glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to
connect with remote-host: localhost (Transport
endpoint is not
connected)
[2015-03-30 22:29:04.394964] I
[glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
0-glusterfsd-mgmt: connecting
to next volfile server 10.12.130.21
[2015-03-30 22:29:08.390687] E
[glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to
connect with remote-host: 10.12.130.21
(Transport endpoint is not
connected)
[2015-03-30 22:29:08.390720] I
[glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
0-glusterfsd-mgmt: connecting
to next volfile server 10.12.130.22
[2015-03-30 22:29:11.392015] E
[glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
0-glusterfsd-mgmt: failed to
connect with remote-host: 10.12.130.22
(Transport endpoint is not
connected)
[2015-03-30 22:29:11.392050] I
[glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
0-glusterfsd-mgmt: connecting
to next volfile server 10.12.130.23
[2015-03-30 22:29:14.406429] I
[dht-shared.c:337:dht_init_regex]
0-brain-dht: using regex rsync-hash-regex =
^\.(.+)\.[^.]+$
[2015-03-30 22:29:14.408964] I
[rpc-clnt.c:969:rpc_clnt_connection_init]
0-host-client-2: setting
frame-timeout to 60
[2015-03-30 22:29:14.409183] I
[rpc-clnt.c:969:rpc_clnt_connection_init]
0-host-client-1: setting
frame-timeout to 60
[2015-03-30 22:29:14.409388] I
[rpc-clnt.c:969:rpc_clnt_connection_init]
0-host-client-0: setting
frame-timeout to 60
[2015-03-30 22:29:14.409430] I
[client.c:2280:notify] 0-host-client-0:
parent translators are ready, attempting
connect on transport
[2015-03-30 22:29:14.409658] I
[client.c:2280:notify] 0-host-client-1:
parent translators are ready, attempting
connect on transport
[2015-03-30 22:29:14.409844] I
[client.c:2280:notify] 0-host-client-2:
parent translators are ready, attempting
connect on transport
Final graph:
....
[2015-03-30 22:29:14.411045] I
[client.c:2215:client_rpc_notify]
0-host-client-2: disconnected from
host-client-2. Client process will
keep trying to connect to glusterd until
brick's port is available
*[2015-03-30 22:29:14.411063] E [MSGID: 108006]
[afr-common.c:3591:afr_notify]
0-myvolume-replicate-0: All subvolumes
are down. Going offline until atleast one of
them comes back up.
*[2015-03-30 22:29:14.414871] I
[fuse-bridge.c:5080:fuse_graph_setup]
0-fuse: switched to graph 0
[2015-03-30 22:29:14.415003] I
[fuse-bridge.c:4009:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol
versions: glusterfs 7.22
kernel 7.17
[2015-03-30 22:29:14.415101] I
[afr-common.c:3722:afr_local_init]
0-myvolume-replicate-0: no subvolumes up
[2015-03-30 22:29:14.415215] I
[afr-common.c:3722:afr_local_init]
0-myvolume-replicate-0: no subvolumes up
[2015-03-30 22:29:14.415236] W
[fuse-bridge.c:779:fuse_attr_cbk]
0-glusterfs-fuse: 2: LOOKUP() / => -1
(Transport endpoint is not
connected)
[2015-03-30 22:29:14.419007] I
[fuse-bridge.c:4921:fuse_thread_proc]
0-fuse: unmounting /opt/shared
*[2015-03-30 22:29:14.420176] W
[glusterfsd.c:1194:cleanup_and_exit]
(--> 0-: received signum (15), shutting down*
[2015-03-30 22:29:14.420192] I
[fuse-bridge.c:5599:fini] 0-fuse:
Unmounting '/opt/shared'.
_Relevant /etc/fstab entries are:_
/dev/xvdb /opt/local xfs
defaults,noatime,nodiratime 0 0
localhost:/myvolume /opt/shared glusterfs
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
0 0
_Volume configuration is:_
Volume Name: myvolume
Type: Replicate
Volume ID: xxxx
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: host1:/opt/local/brick
Brick2: host2:/opt/local/brick
Brick3: host3:/opt/local/brick
Options Reconfigured:
storage.health-check-interval: 5
network.ping-timeout: 5
nfs.disable: on
auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
cluster.quorum-type: auto
network.frame-timeout: 60
I run Debian 7 and the following GlusterFS
version 3.6.2-2.
While I could together some rc.local type of
script which retries to
mount the volume for a while until it succeeds
or times out I was
wondering if there's a better way to solve
this problem?
Thank you for your help.
Regards,
--
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>
_______________________________________________
Gluster-users mailing list
[email protected]
<mailto:[email protected]>
http://www.gluster.org/mailman/listinfo/gluster-users
--
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>
_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users