Re: [Gluster-users] volume started but not 'startable', not 'stoppable'

John Mark Walker Mon, 08 Oct 2012 11:30:22 -0700

I suspect this didn't go through - forwarding.


----- Original Message -----
> Harry,
> 
> Could you paste/attach the contents of /var/lib/glusterd/gli/info
> files and the glusterd log files from the 4 peers in cluster?
> From the volume-info snippet you had pasted, it appears that
> the node which was shutdown differs in its view of the volume's
> status.
> 
> thanks,
> krish
> 
> ----- Original Message -----
> > From: "harry mangalam" <[email protected]>
> > To: [email protected]
> > Sent: Monday, October 8, 2012 2:49:05 AM
> > Subject: Re: [Gluster-users] volume started but not 'startable',
> >     not 'stoppable'
> > 
> > And a few more data points: it appears the reason for the flaky
> > gluster fs is
> > that not all the servers are running glusterfsd's (see below).  Is
> > there a way
> > to force the servers to all start the glusterfsd's as they're
> > supposed to?
> > 
> > The mystery rebalance did complete, and seems to have fixed some
> > but
> > not all
> > problem files - ie:
> > 
> > > drwx------ 2 spoorkas spoorkas  8211 Jun  2 00:22
> > > QPSK_2Tx_2Rx_BH_Method2/
> > > ?--------- ? ?        ?            ?            ?
> > > QPSK_2Tx_2Rx_ML_Method1
> > 
> > And the started/not started status has gotten weirder if possble..
> > 
> > The gluster volume is still being exported to clients, despite
> > gluster
> > insisting that the volume is not started (servers are pbs[1234]
> > result of
> > $ gluster volume status
> > pbs1:Volume gli is not started
> > pbs2:Volume gli is not started
> > pbs3:Volume gli is not started
> > pbs4:Volume gli is not started
> > 
> > $ gluster volume info:
> > pbs1:Status: Stopped
> > pbs2:Status: Started  <- aha!
> > pbs3:Status: Started  <- aha!
> > pbs4:Status: Started
> > 
> > This correlates with the glusterfsd status in which only pbs[23]
> > are
> > running
> > glusterfsd:
> > 
> > pbs2:root      1799  0.1  0.0 184296 16464 ?        Ssl  13:07
> >   0:06
> > /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs2ib.bducgl -p
> > /var/lib/glusterd/vols/gli/run/pbs2ib-bducgl.pid -S
> > /tmp/c70b2f910e2fe1bb485b1d76ef63e3db.socket --brick-name /bducgl
> > -l
> > /var/log/glusterfs/bricks/bducgl.log --xlator-option
> > *-posix.glusterd-
> > uuid=26de63bd-c5b7-48ba-b81d-5d77a533d077 --brick-port 24025 24026
> > --xlator-
> > option gli-server.transport.rdma.listen-port=24026 --xlator-option
> > gli-
> > server.listen-port=24025
> > 
> > pbs3:root      1751  0.1  0.0 184168 16468 ?        Ssl  13:07
> >   0:06
> > /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs3ib.bducgl -p
> > /var/lib/glusterd/vols/gli/run/pbs3ib-bducgl.pid -S
> > /tmp/7096377992feb7f5a7805cafd82c3100.socket --brick-name /bducgl
> > -l
> > /var/log/glusterfs/bricks/bducgl.log --xlator-option
> > *-posix.glusterd-
> > uuid=c79c4084-d6b9-4af9-b975-40dd6aa99b42 --brick-port 24018 24020
> > --xlator-
> > option gli-server.transport.rdma.listen-port=24020 --xlator-option
> > gli-
> > server.listen-port=24018
> > 
> > pbs[14] are only running the glusterd process, not any glusterfsd's
> > 
> > In previous startups, pbs4 WAS running a glusterfsd, but pbs1 has
> > not
> > run one
> > since the powerdown AFAIK.
> > 
> > hjm
> > 
> > 
> > On Saturday, October 06, 2012 10:19:14 PM harry mangalam wrote:
> > > ...and should have added:
> > > 
> > > the rebalance log (the volume claimed to be rebalancing before I
> > > shut it
> > > down but was idle or wedged at that time) is active as well with
> > > about 1
> > > warning of a "1 subvolumes down -- not fixing" for every 3
> > > informational
> > > messages:
> > > 
> > >  2012-10-06 22:05:35.396650] I
> > >  [dht-rebalance.c:1058:gf_defrag_migrate_data]
> > > 0-gli-dht: migrate data called on /nlduong/nduong2-t-
> > > illiac/workspace/m5_sim/trunk/src/arch/.svn/tmp/wcprops
> > > 
> > > [2012-10-06 22:05:35.451925] I
> > > [dht-layout.c:593:dht_layout_normalize]
> > > 0-gli- dht: found anomalies in /nlduong/nduong2-t-
> > > illiac/workspace/m5_sim/trunk/src/arch/.svn/wcprops. holes=1
> > > overlaps=0
> > > 
> > > [2012-10-06 22:05:35.451957] W
> > > [dht-selfheal.c:875:dht_selfheal_directory]
> > > 0- gli-dht: 1 subvolumes down -- not fixing
> > > 
> > > 
> > > previously...
> > > 
> > > gluster 3.3, running on ubuntu 10.04, was running OK, had to shut
> > > down for a
> > > power outage.
> > > 
> > > When I tried to shut it down, it insisted that it was
> > > rebalancing,
> > > but
> > > seeemed wedged - no activity in the logs.
> > > 
> > > Was able to shut it down tho.
> > > 
> > > After power was restored, tried to restart the volume but altho
> > > the
> > > 4 peers
> > > claimed to be visible and could ping each other etc:
> > > ==============================================
> > > Sat Oct 06 21:38:07 [0.81 0.71 0.58]
> > >  root@pbs2:/var/log/glusterfs/bricks
> > > 567 $ gluster peer status
> > > Number of Peers: 3
> > > 
> > > Hostname: pbs3ib
> > > Uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42
> > > State: Peer in Cluster (Connected)
> > > 
> > > Hostname: 10.255.77.2
> > > Uuid: 3fcd023c-9cc9-4d1c-84c4-babfb4492e38
> > > State: Peer in Cluster (Connected)
> > > 
> > > Hostname: pbs4ib
> > > Uuid: 2a593581-bf45-446c-8f7c-212c53297803
> > > State: Peer in Cluster (Connected)
> > > ==============================================
> > > 
> > > and the volume info seemed to be OK:
> > > ==============================================
> > > Sat Oct 06 21:36:11 [0.75 0.67 0.56]
> > >  root@pbs2:/var/log/glusterfs/bricks
> > > 565 $ gluster volume info gli
> > > 
> > > Volume Name: gli
> > > Type: Distribute
> > > Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4
> > > Status: Started
> > > Number of Bricks: 4
> > > Transport-type: tcp,rdma
> > > Bricks:
> > > Brick1: pbs1ib:/bducgl
> > > Brick2: pbs2ib:/bducgl
> > > Brick3: pbs3ib:/bducgl
> > > Brick4: pbs4ib:/bducgl
> > > Options Reconfigured:
> > > performance.write-behind-window-size: 1024MB
> > > performance.flush-behind: on
> > > performance.cache-size: 268435456
> > > nfs.disable: on
> > > performance.io-thread-count: 64
> > > performance.quick-read: on
> > > performance.io-cache: on
> > > 
> > > ==============================================
> > > some utilities claim that it was not started, even tho some
> > > clients
> > > /are
> > > using the volume/ (tho there are some file oddities)
> > > (from a client):
> > > 
> > > -rw-r--r-- 1 hmangala hmangala       32935 Jun 23  2010
> > > INSTALL.txt
> > > ?--------- ? ?        ?                  ?            ? R-2.15.0
> > > drwxr-xr-x 2 hmangala hmangala          18 Sep 10 14:20 bonnie/
> > > drwxr-xr-x 2 root     root              18 Sep 10 13:41 bonnie2/
> > > 
> > > drwx------ 2 spoorkas spoorkas  8211 Jun  2 00:22
> > > QPSK_2Tx_2Rx_BH_Method2/
> > > ?--------- ? ?        ?            ?            ?
> > > QPSK_2Tx_2Rx_ML_Method1
> > > drwx------ 2 spoorkas spoorkas  8237 Jun  3 11:22
> > > QPSK_2Tx_2Rx_ML_Method2/
> > > drwx------ 2 spoorkas spoorkas 12288 Jun  4 01:24
> > > QPSK_2Tx_3Rx_BH/
> > > drwx------ 2 spoorkas spoorkas  4232 Jun  2 00:26
> > > QPSK_2Tx_3Rx_BH_Method1/
> > > drwx------ 2 spoorkas spoorkas  8274 Jun  2 00:34
> > > QPSK_2Tx_3Rx_BH_Method2/
> > > ?--------- ? ?        ?            ?            ?
> > > QPSK_2Tx_3Rx_ML_Method1
> > > ?--------- ? ?        ?            ?            ?
> > > QPSK_2Tx_3Rx_ML_Method2
> > > -rw-r--r-- 1 spoorkas spoorkas     0 Apr 17 14:16
> > > simple.sh.e1802207
> > > 
> > > (These files appear to be intact on the individual bricks tho.)
> > > 
> > > ==============================================
> > > Sat Oct 06 21:38:18 [0.76 0.71 0.58]
> > >  root@pbs2:/var/log/glusterfs/bricks
> > > 568 $ gluster volume status
> > > Volume gli is not started
> > > ==============================================
> > > 
> > > and since that is the case, other utilities also claim this:
> > > 
> > > ==============================================
> > > Sat Oct 06 21:41:25 [1.04 0.84 0.65]
> > >  root@pbs2:/var/log/glusterfs/bricks
> > > 571 $ gluster volume status gli detail
> > > Volume gli is not started
> > > ==============================================
> > > 
> > > And since they think it's not started, I can't stop it.
> > > 
> > > How is this resolvable?
> > --
> > Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
> > [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
> > 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
> > MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
> > --
> > Passive-Aggressive Supporter of the The Canada Party:
> >   <http://www.americabutbetter.com/>
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > [email protected]
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] volume started but not 'startable', not 'stoppable'

Reply via email to