Any update here? Can I hope to see a fix incorporated into the release of 3.6.3 ?
On Tue, Mar 31, 2015 at 10:53 AM, Pranith Kumar Karampuri < [email protected]> wrote: > > On 03/31/2015 10:47 PM, Rumen Telbizov wrote: > > Pranith and Atin, > > Thank you for looking into this and confirming it's a bug. Please log > the bug yourself since I am not familiar with the project's bug-tracking > system. > > Assessing its severity and the fact that this effectively stops the > cluster from functioning properly after boot, what do you think would be > the timeline for fixing this issue? What version do you expect to see this > fixed in? > > In the meantime, is there another workaround that you might suggest > besides running a secondary mount later after the boot is over? > > Adding glusterd maintainers to the thread: +kaushal, +krishnan > I will let them answer your questions. > > Pranith > > > Thank you again for your help, > Rumen Telbizov > > > > On Tue, Mar 31, 2015 at 2:53 AM, Pranith Kumar Karampuri < > [email protected]> wrote: > >> >> On 03/31/2015 01:55 PM, Atin Mukherjee wrote: >> >>> >>> On 03/31/2015 01:03 PM, Pranith Kumar Karampuri wrote: >>> >>>> On 03/31/2015 12:53 PM, Atin Mukherjee wrote: >>>> >>>>> On 03/31/2015 12:27 PM, Pranith Kumar Karampuri wrote: >>>>> >>>>>> Atin, >>>>>> Could it be because bricks are started with >>>>>> PROC_START_NO_WAIT? >>>>>> >>>>> That's the correct analysis Pranith. Mount was attempted before the >>>>> bricks were started. If we can have a time lag in some seconds between >>>>> mount and volume start the problem will go away. >>>>> >>>> Atin, >>>> I think one way to solve this issue is to start the bricks with >>>> NO_WAIT so that we can handle pmap-signin but wait for the pmap-signins >>>> to complete before responding to cli/completing 'init'? >>>> >>> Logically it should solve the problem. We need to think around it more >>> from the existing design perspective. >>> >> Rumen, >> Feel free to log a bug. This should be fixed in later release. We >> can raise the bug and work it as well if you prefer it this way. >> >> Pranith >> >> >>> ~Atin >>> >>>> Pranith >>>> >>>>> >>>>> Pranith >>>>>> On 03/31/2015 04:41 AM, Rumen Telbizov wrote: >>>>>> >>>>>>> Hello everyone, >>>>>>> >>>>>>> I have a problem that I am trying to resolve and not sure which way >>>>>>> to >>>>>>> go so here I am asking for your advise. >>>>>>> >>>>>>> What it comes down to is that upon initial boot of all my GlusterFS >>>>>>> machines the shared volume doesn't get mounted. Nevertheless the >>>>>>> volume successfully created and started and further attempts to mount >>>>>>> it manually succeed. I suspect what's happening is that gluster >>>>>>> processes/bricks/etc haven't fully started at the time the /etc/fstab >>>>>>> entry is read and the initial mount attempt is being made. Again, by >>>>>>> the time I log in and run a mount -a -- the volume mounts without any >>>>>>> issues. >>>>>>> >>>>>>> _Details from the logs:_ >>>>>>> >>>>>>> [2015-03-30 22:29:04.381918] I [MSGID: 100030] >>>>>>> [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running >>>>>>> /usr/sbin/glusterfs version 3.6.2 (args: /usr/sbin/glusterfs >>>>>>> --log-file=/var/log/glusterfs/glusterfs.log --attribute-timeout=0 >>>>>>> --entry-timeout=0 --volfile-server=localhost >>>>>>> --volfile-server=10.12.130.21 --volfile-server=10.12.130.22 >>>>>>> --volfile-server=10.12.130.23 --volfile-id=/myvolume /opt/shared) >>>>>>> [2015-03-30 22:29:04.394913] E [socket.c:2267:socket_connect_finish] >>>>>>> 0-glusterfs: connection to 127.0.0.1:24007 <http://127.0.0.1:24007> >>>>>>> failed (Connection refused) >>>>>>> [2015-03-30 22:29:04.394950] E >>>>>>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to >>>>>>> connect with remote-host: localhost (Transport endpoint is not >>>>>>> connected) >>>>>>> [2015-03-30 22:29:04.394964] I >>>>>>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: >>>>>>> connecting >>>>>>> to next volfile server 10.12.130.21 >>>>>>> [2015-03-30 22:29:08.390687] E >>>>>>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to >>>>>>> connect with remote-host: 10.12.130.21 (Transport endpoint is not >>>>>>> connected) >>>>>>> [2015-03-30 22:29:08.390720] I >>>>>>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: >>>>>>> connecting >>>>>>> to next volfile server 10.12.130.22 >>>>>>> [2015-03-30 22:29:11.392015] E >>>>>>> [glusterfsd-mgmt.c:1811:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to >>>>>>> connect with remote-host: 10.12.130.22 (Transport endpoint is not >>>>>>> connected) >>>>>>> [2015-03-30 22:29:11.392050] I >>>>>>> [glusterfsd-mgmt.c:1838:mgmt_rpc_notify] 0-glusterfsd-mgmt: >>>>>>> connecting >>>>>>> to next volfile server 10.12.130.23 >>>>>>> [2015-03-30 22:29:14.406429] I [dht-shared.c:337:dht_init_regex] >>>>>>> 0-brain-dht: using regex rsync-hash-regex = ^\.(.+)\.[^.]+$ >>>>>>> [2015-03-30 22:29:14.408964] I >>>>>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-2: setting >>>>>>> frame-timeout to 60 >>>>>>> [2015-03-30 22:29:14.409183] I >>>>>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-1: setting >>>>>>> frame-timeout to 60 >>>>>>> [2015-03-30 22:29:14.409388] I >>>>>>> [rpc-clnt.c:969:rpc_clnt_connection_init] 0-host-client-0: setting >>>>>>> frame-timeout to 60 >>>>>>> [2015-03-30 22:29:14.409430] I [client.c:2280:notify] >>>>>>> 0-host-client-0: >>>>>>> parent translators are ready, attempting connect on transport >>>>>>> [2015-03-30 22:29:14.409658] I [client.c:2280:notify] >>>>>>> 0-host-client-1: >>>>>>> parent translators are ready, attempting connect on transport >>>>>>> [2015-03-30 22:29:14.409844] I [client.c:2280:notify] >>>>>>> 0-host-client-2: >>>>>>> parent translators are ready, attempting connect on transport >>>>>>> Final graph: >>>>>>> >>>>>>> .... >>>>>>> >>>>>>> [2015-03-30 22:29:14.411045] I [client.c:2215:client_rpc_notify] >>>>>>> 0-host-client-2: disconnected from host-client-2. Client process will >>>>>>> keep trying to connect to glusterd until brick's port is available >>>>>>> *[2015-03-30 22:29:14.411063] E [MSGID: 108006] >>>>>>> [afr-common.c:3591:afr_notify] 0-myvolume-replicate-0: All subvolumes >>>>>>> are down. Going offline until atleast one of them comes back up. >>>>>>> *[2015-03-30 22:29:14.414871] I [fuse-bridge.c:5080:fuse_graph_setup] >>>>>>> 0-fuse: switched to graph 0 >>>>>>> [2015-03-30 22:29:14.415003] I [fuse-bridge.c:4009:fuse_init] >>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 >>>>>>> kernel 7.17 >>>>>>> [2015-03-30 22:29:14.415101] I [afr-common.c:3722:afr_local_init] >>>>>>> 0-myvolume-replicate-0: no subvolumes up >>>>>>> [2015-03-30 22:29:14.415215] I [afr-common.c:3722:afr_local_init] >>>>>>> 0-myvolume-replicate-0: no subvolumes up >>>>>>> [2015-03-30 22:29:14.415236] W [fuse-bridge.c:779:fuse_attr_cbk] >>>>>>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not >>>>>>> connected) >>>>>>> [2015-03-30 22:29:14.419007] I [fuse-bridge.c:4921:fuse_thread_proc] >>>>>>> 0-fuse: unmounting /opt/shared >>>>>>> *[2015-03-30 22:29:14.420176] W [glusterfsd.c:1194:cleanup_and_exit] >>>>>>> (--> 0-: received signum (15), shutting down* >>>>>>> [2015-03-30 22:29:14.420192] I [fuse-bridge.c:5599:fini] 0-fuse: >>>>>>> Unmounting '/opt/shared'. >>>>>>> >>>>>>> >>>>>>> _Relevant /etc/fstab entries are:_ >>>>>>> >>>>>>> /dev/xvdb /opt/local xfs defaults,noatime,nodiratime 0 0 >>>>>>> >>>>>>> localhost:/myvolume /opt/shared glusterfs >>>>>>> >>>>>>> defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10 >>>>>>> .12.130.22:10.12.130.23 >>>>>>> >>>>>>> 0 0 >>>>>>> >>>>>>> >>>>>>> _Volume configuration is:_ >>>>>>> >>>>>>> Volume Name: myvolume >>>>>>> Type: Replicate >>>>>>> Volume ID: xxxx >>>>>>> Status: Started >>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>> Transport-type: tcp >>>>>>> Bricks: >>>>>>> Brick1: host1:/opt/local/brick >>>>>>> Brick2: host2:/opt/local/brick >>>>>>> Brick3: host3:/opt/local/brick >>>>>>> Options Reconfigured: >>>>>>> storage.health-check-interval: 5 >>>>>>> network.ping-timeout: 5 >>>>>>> nfs.disable: on >>>>>>> auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23 >>>>>>> cluster.quorum-type: auto >>>>>>> network.frame-timeout: 60 >>>>>>> >>>>>>> >>>>>>> I run Debian 7 and the following GlusterFS version 3.6.2-2. >>>>>>> >>>>>>> While I could together some rc.local type of script which retries to >>>>>>> mount the volume for a while until it succeeds or times out I was >>>>>>> wondering if there's a better way to solve this problem? >>>>>>> >>>>>>> Thank you for your help. >>>>>>> >>>>>>> Regards, >>>>>>> -- >>>>>>> Rumen Telbizov >>>>>>> Unix Systems Administrator <http://telbizov.com> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> [email protected] >>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>> >>>> >>>> >> > > > -- > Rumen Telbizov > Unix Systems Administrator <http://telbizov.com> > > > -- Rumen Telbizov Unix Systems Administrator <http://telbizov.com>
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
