Re: [Gluster-users] Initial mount problem - all subvolumes are down

Pranith Kumar Karampuri Tue, 31 Mar 2015 10:55:00 -0700


On 03/31/2015 10:47 PM, Rumen Telbizov wrote:

Pranith and Atin,
Thank you for looking into this and confirming it's a bug. Please logthe bug yourself since I am not familiar with the project'sbug-tracking system.
Assessing its severity and the fact that this effectively stops thecluster from functioning properly after boot, what do you think wouldbe the timeline for fixing this issue? What version do you expect tosee this fixed in?
In the meantime, is there another workaround that you might suggestbesides running a secondary mount later after the boot is over?

Adding glusterd maintainers to the thread: +kaushal, +krishnan
I will let them answer your questions.

Pranith


Thank you again for your help,
Rumen Telbizov

On Tue, Mar 31, 2015 at 2:53 AM, Pranith Kumar Karampuri<[email protected] <mailto:[email protected]>> wrote:



    On 03/31/2015 01:55 PM, Atin Mukherjee wrote:


        On 03/31/2015 01:03 PM, Pranith Kumar Karampuri wrote:

            On 03/31/2015 12:53 PM, Atin Mukherjee wrote:

                On 03/31/2015 12:27 PM, Pranith Kumar Karampuri wrote:

                    Atin,
                             Could it be because bricks are started
                    with PROC_START_NO_WAIT?

                That's the correct analysis Pranith. Mount was
                attempted before the
                bricks were started. If we can have a time lag in some
                seconds between
                mount and volume start the problem will go away.

            Atin,
                    I think one way to solve this issue is to start
            the bricks with
            NO_WAIT so that we can handle pmap-signin but wait for the
            pmap-signins
            to complete before responding to cli/completing 'init'?

        Logically it should solve the problem. We need to think around
        it more
        from the existing design perspective.

    Rumen,
         Feel free to log a bug. This should be fixed in later
    release. We can raise the bug and work it as well if you prefer it
    this way.

    Pranith


        ~Atin

            Pranith


                    Pranith
                    On 03/31/2015 04:41 AM, Rumen Telbizov wrote:

                        Hello everyone,

                        I have a problem that I am trying to resolve
                        and not sure which way to
                        go so here I am asking for your advise.

                        What it comes down to is that upon initial
                        boot of all my GlusterFS
                        machines the shared volume doesn't get
                        mounted. Nevertheless the
                        volume successfully created and started and
                        further attempts to mount
                        it manually succeed. I suspect what's
                        happening is that gluster
                        processes/bricks/etc haven't fully started at
                        the time the /etc/fstab
                        entry is read and the initial mount attempt is
                        being made. Again, by
                        the time I log in and run a mount -a -- the
                        volume mounts without any
                        issues.

                        _Details from the logs:_

                        [2015-03-30 22:29:04.381918] I [MSGID: 100030]
                        [glusterfsd.c:2018:main]
                        0-/usr/sbin/glusterfs: Started running
                        /usr/sbin/glusterfs version 3.6.2 (args:
                        /usr/sbin/glusterfs
                        --log-file=/var/log/glusterfs/glusterfs.log
                        --attribute-timeout=0
                        --entry-timeout=0 --volfile-server=localhost
                        --volfile-server=10.12.130.21
                        --volfile-server=10.12.130.22
                        --volfile-server=10.12.130.23
                        --volfile-id=/myvolume /opt/shared)
                        [2015-03-30 22:29:04.394913] E
                        [socket.c:2267:socket_connect_finish]
                        0-glusterfs: connection to 127.0.0.1:24007
                        <http://127.0.0.1:24007> <http://127.0.0.1:24007>
                        failed (Connection refused)
                        [2015-03-30 22:29:04.394950] E
                        [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
                        0-glusterfsd-mgmt: failed to
                        connect with remote-host: localhost (Transport
                        endpoint is not
                        connected)
                        [2015-03-30 22:29:04.394964] I
                        [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
                        0-glusterfsd-mgmt: connecting
                        to next volfile server 10.12.130.21
                        [2015-03-30 22:29:08.390687] E
                        [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
                        0-glusterfsd-mgmt: failed to
                        connect with remote-host: 10.12.130.21
                        (Transport endpoint is not
                        connected)
                        [2015-03-30 22:29:08.390720] I
                        [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
                        0-glusterfsd-mgmt: connecting
                        to next volfile server 10.12.130.22
                        [2015-03-30 22:29:11.392015] E
                        [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
                        0-glusterfsd-mgmt: failed to
                        connect with remote-host: 10.12.130.22
                        (Transport endpoint is not
                        connected)
                        [2015-03-30 22:29:11.392050] I
                        [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
                        0-glusterfsd-mgmt: connecting
                        to next volfile server 10.12.130.23
                        [2015-03-30 22:29:14.406429] I
                        [dht-shared.c:337:dht_init_regex]
                        0-brain-dht: using regex rsync-hash-regex =
                        ^\.(.+)\.[^.]+$
                        [2015-03-30 22:29:14.408964] I
                        [rpc-clnt.c:969:rpc_clnt_connection_init]
                        0-host-client-2: setting
                        frame-timeout to 60
                        [2015-03-30 22:29:14.409183] I
                        [rpc-clnt.c:969:rpc_clnt_connection_init]
                        0-host-client-1: setting
                        frame-timeout to 60
                        [2015-03-30 22:29:14.409388] I
                        [rpc-clnt.c:969:rpc_clnt_connection_init]
                        0-host-client-0: setting
                        frame-timeout to 60
                        [2015-03-30 22:29:14.409430] I
                        [client.c:2280:notify] 0-host-client-0:
                        parent translators are ready, attempting
                        connect on transport
                        [2015-03-30 22:29:14.409658] I
                        [client.c:2280:notify] 0-host-client-1:
                        parent translators are ready, attempting
                        connect on transport
                        [2015-03-30 22:29:14.409844] I
                        [client.c:2280:notify] 0-host-client-2:
                        parent translators are ready, attempting
                        connect on transport
                        Final graph:

                        ....

                        [2015-03-30 22:29:14.411045] I
                        [client.c:2215:client_rpc_notify]
                        0-host-client-2: disconnected from
                        host-client-2. Client process will
                        keep trying to connect to glusterd until
                        brick's port is available
                        *[2015-03-30 22:29:14.411063] E [MSGID: 108006]
                        [afr-common.c:3591:afr_notify]
                        0-myvolume-replicate-0: All subvolumes
                        are down. Going offline until atleast one of
                        them comes back up.
                        *[2015-03-30 22:29:14.414871] I
                        [fuse-bridge.c:5080:fuse_graph_setup]
                        0-fuse: switched to graph 0
                        [2015-03-30 22:29:14.415003] I
                        [fuse-bridge.c:4009:fuse_init]
                        0-glusterfs-fuse: FUSE inited with protocol
                        versions: glusterfs 7.22
                        kernel 7.17
                        [2015-03-30 22:29:14.415101] I
                        [afr-common.c:3722:afr_local_init]
                        0-myvolume-replicate-0: no subvolumes up
                        [2015-03-30 22:29:14.415215] I
                        [afr-common.c:3722:afr_local_init]
                        0-myvolume-replicate-0: no subvolumes up
                        [2015-03-30 22:29:14.415236] W
                        [fuse-bridge.c:779:fuse_attr_cbk]
                        0-glusterfs-fuse: 2: LOOKUP() / => -1
                        (Transport endpoint is not
                        connected)
                        [2015-03-30 22:29:14.419007] I
                        [fuse-bridge.c:4921:fuse_thread_proc]
                        0-fuse: unmounting /opt/shared
                        *[2015-03-30 22:29:14.420176] W
                        [glusterfsd.c:1194:cleanup_and_exit]
                        (--> 0-: received signum (15), shutting down*
                        [2015-03-30 22:29:14.420192] I
                        [fuse-bridge.c:5599:fini] 0-fuse:
                        Unmounting '/opt/shared'.


                        _Relevant /etc/fstab entries are:_

                        /dev/xvdb /opt/local xfs
                        defaults,noatime,nodiratime 0 0

                        localhost:/myvolume /opt/shared glusterfs
                        
defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23

                        0 0


                        _Volume configuration is:_

                        Volume Name: myvolume
                        Type: Replicate
                        Volume ID: xxxx
                        Status: Started
                        Number of Bricks: 1 x 3 = 3
                        Transport-type: tcp
                        Bricks:
                        Brick1: host1:/opt/local/brick
                        Brick2: host2:/opt/local/brick
                        Brick3: host3:/opt/local/brick
                        Options Reconfigured:
                        storage.health-check-interval: 5
                        network.ping-timeout: 5
                        nfs.disable: on
                        auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
                        cluster.quorum-type: auto
                        network.frame-timeout: 60


                        I run Debian 7 and the following GlusterFS
                        version 3.6.2-2.

                        While I could together some rc.local type of
                        script which retries to
                        mount the volume for a while until it succeeds
                        or times out I was
                        wondering if there's a better way to solve
                        this problem?

                        Thank you for your help.

                        Regards,

--Rumen Telbizov

                        Unix Systems Administrator <http://telbizov.com>


                        _______________________________________________
                        Gluster-users mailing list
                        [email protected]
                        <mailto:[email protected]>
                        http://www.gluster.org/mailman/listinfo/gluster-users







--
Rumen Telbizov
Unix Systems Administrator <http://telbizov.com>

_______________________________________________
Gluster-users mailing list
[email protected]
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Initial mount problem - all subvolumes are down

Reply via email to