Thank you, I created bug with all logs: https://bugzilla.redhat.com/show_bug.cgi?id=1467050
During testing I found second bug: https://bugzilla.redhat.com/show_bug.cgi?id=1467057 There something wrong with Ganesha when Gluster bricks are named "w0" or "sw0". On Fri, Jun 30, 2017 at 11:36 AM, Hari Gowtham <[email protected]> wrote: > Hi, > > Jan, by multiple times I meant whether you were able to do the whole > setup multiple times and face the same issue. > So that we have a consistent reproducer to work on. > > As grepping shows that the process doesn't exist the bug I mentioned > doesn't hold good. > Seems like another issue irrelevant to the bug i mentioned (have > mentioned it now). > > When you say too often, this means there is a way to reproduce it. > Please do let us know the steps you performed to check. but this > shouldn't happen if you try again. > > You won't have this issue often. and as Mani mentioned do not write a > script to start force it. > If this issue exists with a proper reproducer we will take a look at it. > > Sorry, forgot to provide the link for the fix: > patch : https://review.gluster.org/#/c/17101/ > > If you find a reproducer do file a bug at > https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS > > > On Fri, Jun 30, 2017 at 3:33 PM, Manikandan Selvaganesh > <[email protected]> wrote: > > Hi Jan, > > > > It is not recommended that you automate the script for 'volume start > force'. > > Bricks do not go offline just like that. There will be some genuine issue > > which triggers this. Could you please attach the entire glusterd.logs and > > the brick logs around the time so that someone would be able to look? > > > > Just to make sure, please check if you have any network outage(using > iperf > > or some standard tool). > > > > @Hari, i think you forgot to provide the bug link, please provide so that > > Jan > > or someone can check if it is related. > > > > > > -- > > Thanks & Regards, > > Manikandan Selvaganesan. > > (@Manikandan Selvaganesh on Web) > > > > On Fri, Jun 30, 2017 at 3:19 PM, Jan <[email protected]> wrote: > >> > >> Hi Hari, > >> > >> thank you for your support! > >> > >> Did I try to check offline bricks multiple times? > >> Yes – I gave it enough time (at least 20 minutes) to recover but it > stayed > >> offline. > >> > >> Version? > >> All nodes are 100% equal – I tried fresh installation several times > during > >> my testing, Every time it is CentOS Minimal install with all updates and > >> without any additional software: > >> > >> uname -r > >> 3.10.0-514.21.2.el7.x86_64 > >> > >> yum list installed | egrep 'gluster|ganesha' > >> centos-release-gluster310.noarch 1.0-1.el7.centos @extras > >> glusterfs.x86_64 3.10.2-1.el7 > >> @centos-gluster310 > >> glusterfs-api.x86_64 3.10.2-1.el7 > >> @centos-gluster310 > >> glusterfs-cli.x86_64 3.10.2-1.el7 > >> @centos-gluster310 > >> glusterfs-client-xlators.x86_64 3.10.2-1.el7 > >> @centos-gluster310 > >> glusterfs-fuse.x86_64 3.10.2-1.el7 > >> @centos-gluster310 > >> glusterfs-ganesha.x86_64 3.10.2-1.el7 > >> @centos-gluster310 > >> glusterfs-libs.x86_64 3.10.2-1.el7 > >> @centos-gluster310 > >> glusterfs-server.x86_64 3.10.2-1.el7 > >> @centos-gluster310 > >> libntirpc.x86_64 1.4.3-1.el7 > >> @centos-gluster310 > >> nfs-ganesha.x86_64 2.4.5-1.el7 > >> @centos-gluster310 > >> nfs-ganesha-gluster.x86_64 2.4.5-1.el7 > >> @centos-gluster310 > >> userspace-rcu.x86_64 0.7.16-3.el7 > >> @centos-gluster310 > >> > >> Grepping for the brick process? > >> I’ve just tried it again. Process doesn’t exist when brick is offline. > >> > >> Force start command? > >> sudo gluster volume start MyVolume force > >> > >> That works! Thank you. > >> > >> If I have this issue too often then I can create simple script that > greps > >> all bricks on the local server and force start when it’s offline. I can > >> schedule such script once after for example 5 minutes after boot. > >> > >> But I’m not sure if it’s good idea to automate it. I’d be worried that I > >> can force it up even when the node doesn’t “see” other nodes and cause > split > >> brain issue. > >> > >> Thank you! > >> > >> Kind regards, > >> Jan > >> > >> > >> On Fri, Jun 30, 2017 at 8:01 AM, Hari Gowtham <[email protected]> > wrote: > >>> > >>> Hi Jan, > >>> > >>> comments inline. > >>> > >>> On Fri, Jun 30, 2017 at 1:31 AM, Jan <[email protected]> wrote: > >>> > Hi all, > >>> > > >>> > Gluster and Ganesha are amazing. Thank you for this great work! > >>> > > >>> > I’m struggling with one issue and I think that you might be able to > >>> > help me. > >>> > > >>> > I spent some time by playing with Gluster and Ganesha and after I > gain > >>> > some > >>> > experience I decided that I should go into production but I’m still > >>> > struggling with one issue. > >>> > > >>> > I have 3x node CentOS 7.3 with the most current Gluster and Ganesha > >>> > from > >>> > centos-gluster310 repository (3.10.2-1.el7) with replicated bricks. > >>> > > >>> > Servers have a lot of resources and they run in a subnet on a stable > >>> > network. > >>> > > >>> > I didn’t have any issues when I tested a single brick. But now I’d > like > >>> > to > >>> > setup 17 replicated bricks and I realized that when I restart one of > >>> > nodes > >>> > then the result looks like this: > >>> > > >>> > sudo gluster volume status | grep ' N ' > >>> > > >>> > Brick glunode0:/st/brick3/dir N/A N/A N > N/A > >>> > Brick glunode1:/st/brick2/dir N/A N/A N > N/A > >>> > > >>> > >>> did you try it multiple times? > >>> > >>> > Some bricks just don’t go online. Sometime it’s one brick, sometime > >>> > tree and > >>> > it’s not same brick – it’s random issue. > >>> > > >>> > I checked log on affected servers and this is an example: > >>> > > >>> > sudo tail /var/log/glusterfs/bricks/st-brick3-0.log > >>> > > >>> > [2017-06-29 17:59:48.651581] W [socket.c:593:__socket_rwv] > 0-glusterfs: > >>> > readv on 10.2.44.23:24007 failed (No data available) > >>> > [2017-06-29 17:59:48.651622] E [glusterfsd-mgmt.c:2114:mgmt_ > rpc_notify] > >>> > 0-glusterfsd-mgmt: failed to connect with remote-host: glunode0 (No > >>> > data > >>> > available) > >>> > [2017-06-29 17:59:48.651638] I [glusterfsd-mgmt.c:2133:mgmt_ > rpc_notify] > >>> > 0-glusterfsd-mgmt: Exhausted all volfile servers > >>> > [2017-06-29 17:59:49.944103] W [glusterfsd.c:1332:cleanup_and_exit] > >>> > (-->/lib64/libpthread.so.0(+0x7dc5) [0x7f3158032dc5] > >>> > -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x7f31596cbfd5] > >>> > -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x7f31596cbdfb] ) > >>> > 0-:received signum (15), shutting down > >>> > [2017-06-29 17:59:50.397107] E [socket.c:3203:socket_connect] > >>> > 0-glusterfs: > >>> > connection attempt on 10.2.44.23:24007 failed, (Network is > unreachable) > >>> > [2017-06-29 17:59:50.397138] I [socket.c:3507:socket_submit_request] > >>> > 0-glusterfs: not connected (priv->connected = 0) > >>> > [2017-06-29 17:59:50.397162] W [rpc-clnt.c:1693:rpc_clnt_submit] > >>> > 0-glusterfs: failed to submit rpc-request (XID: 0x3 Program: Gluster > >>> > Portmap, ProgVers: 1, Proc: 5) to rpc-transport (glusterfs) > >>> > > >>> > I think that important message is “Network is unreachable”. > >>> > > >>> > Question > >>> > 1. Could you please tell me, is that normal when you have many > bricks? > >>> > Networks is definitely stable and other servers use it without > problem > >>> > and > >>> > all servers run on a same pair of switches. My assumption is that in > >>> > the > >>> > same time many bricks try to connect and that doesn’t work. > >>> > >>> no. it shouldnt happen if there are multiple bricks. > >>> there was a bug related to this [1] > >>> to verify if that was the issue I need to know a few things. > >>> 1) are all the node of the same version. > >>> 2) did you check grepping for the brick process using the ps command? > >>> need to verify is the brick is still up and is not connected to > glusterd > >>> alone. > >>> > >>> > >>> > > >>> > 2. Is there an option to configure a brick to enable some kind of > >>> > autoreconnect or add some timeout? > >>> > gluster volume set brick123 option456 abc ?? > >>> If the brick process is not seen in the ps aux | grep glusterfsd > >>> The way to start a brick is to use the volume start force command. > >>> If brick is not started there is no point configuring it. and to start > >>> a brick we cant > >>> use the configure command. > >>> > >>> > > >>> > 3. What it the recommend way to fix offline brick on the affected > >>> > server? I > >>> > don’t want to use “gluster volume stop/start” since affected bricks > are > >>> > online on other server and there is no reason to completely turn it > >>> > off. > >>> gluster volume start force will not bring down the bricks that are > >>> already up and > >>> running. > >>> > >>> > > >>> > Thank you, > >>> > Jan > >>> > > >>> > _______________________________________________ > >>> > Gluster-users mailing list > >>> > [email protected] > >>> > http://lists.gluster.org/mailman/listinfo/gluster-users > >>> > >>> > >>> > >>> -- > >>> Regards, > >>> Hari Gowtham. > >> > >> > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> [email protected] > >> http://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > -- > Regards, > Hari Gowtham. >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
