I suspect this didn't go through - forwarding.
----- Original Message ----- > Harry, > > Could you paste/attach the contents of /var/lib/glusterd/gli/info > files and the glusterd log files from the 4 peers in cluster? > From the volume-info snippet you had pasted, it appears that > the node which was shutdown differs in its view of the volume's > status. > > thanks, > krish > > ----- Original Message ----- > > From: "harry mangalam" <[email protected]> > > To: [email protected] > > Sent: Monday, October 8, 2012 2:49:05 AM > > Subject: Re: [Gluster-users] volume started but not 'startable', > > not 'stoppable' > > > > And a few more data points: it appears the reason for the flaky > > gluster fs is > > that not all the servers are running glusterfsd's (see below). Is > > there a way > > to force the servers to all start the glusterfsd's as they're > > supposed to? > > > > The mystery rebalance did complete, and seems to have fixed some > > but > > not all > > problem files - ie: > > > > > drwx------ 2 spoorkas spoorkas 8211 Jun 2 00:22 > > > QPSK_2Tx_2Rx_BH_Method2/ > > > ?--------- ? ? ? ? ? > > > QPSK_2Tx_2Rx_ML_Method1 > > > > And the started/not started status has gotten weirder if possble.. > > > > The gluster volume is still being exported to clients, despite > > gluster > > insisting that the volume is not started (servers are pbs[1234] > > result of > > $ gluster volume status > > pbs1:Volume gli is not started > > pbs2:Volume gli is not started > > pbs3:Volume gli is not started > > pbs4:Volume gli is not started > > > > $ gluster volume info: > > pbs1:Status: Stopped > > pbs2:Status: Started <- aha! > > pbs3:Status: Started <- aha! > > pbs4:Status: Started > > > > This correlates with the glusterfsd status in which only pbs[23] > > are > > running > > glusterfsd: > > > > pbs2:root 1799 0.1 0.0 184296 16464 ? Ssl 13:07 > > 0:06 > > /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs2ib.bducgl -p > > /var/lib/glusterd/vols/gli/run/pbs2ib-bducgl.pid -S > > /tmp/c70b2f910e2fe1bb485b1d76ef63e3db.socket --brick-name /bducgl > > -l > > /var/log/glusterfs/bricks/bducgl.log --xlator-option > > *-posix.glusterd- > > uuid=26de63bd-c5b7-48ba-b81d-5d77a533d077 --brick-port 24025 24026 > > --xlator- > > option gli-server.transport.rdma.listen-port=24026 --xlator-option > > gli- > > server.listen-port=24025 > > > > pbs3:root 1751 0.1 0.0 184168 16468 ? Ssl 13:07 > > 0:06 > > /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs3ib.bducgl -p > > /var/lib/glusterd/vols/gli/run/pbs3ib-bducgl.pid -S > > /tmp/7096377992feb7f5a7805cafd82c3100.socket --brick-name /bducgl > > -l > > /var/log/glusterfs/bricks/bducgl.log --xlator-option > > *-posix.glusterd- > > uuid=c79c4084-d6b9-4af9-b975-40dd6aa99b42 --brick-port 24018 24020 > > --xlator- > > option gli-server.transport.rdma.listen-port=24020 --xlator-option > > gli- > > server.listen-port=24018 > > > > pbs[14] are only running the glusterd process, not any glusterfsd's > > > > In previous startups, pbs4 WAS running a glusterfsd, but pbs1 has > > not > > run one > > since the powerdown AFAIK. > > > > hjm > > > > > > On Saturday, October 06, 2012 10:19:14 PM harry mangalam wrote: > > > ...and should have added: > > > > > > the rebalance log (the volume claimed to be rebalancing before I > > > shut it > > > down but was idle or wedged at that time) is active as well with > > > about 1 > > > warning of a "1 subvolumes down -- not fixing" for every 3 > > > informational > > > messages: > > > > > > 2012-10-06 22:05:35.396650] I > > > [dht-rebalance.c:1058:gf_defrag_migrate_data] > > > 0-gli-dht: migrate data called on /nlduong/nduong2-t- > > > illiac/workspace/m5_sim/trunk/src/arch/.svn/tmp/wcprops > > > > > > [2012-10-06 22:05:35.451925] I > > > [dht-layout.c:593:dht_layout_normalize] > > > 0-gli- dht: found anomalies in /nlduong/nduong2-t- > > > illiac/workspace/m5_sim/trunk/src/arch/.svn/wcprops. holes=1 > > > overlaps=0 > > > > > > [2012-10-06 22:05:35.451957] W > > > [dht-selfheal.c:875:dht_selfheal_directory] > > > 0- gli-dht: 1 subvolumes down -- not fixing > > > > > > > > > previously... > > > > > > gluster 3.3, running on ubuntu 10.04, was running OK, had to shut > > > down for a > > > power outage. > > > > > > When I tried to shut it down, it insisted that it was > > > rebalancing, > > > but > > > seeemed wedged - no activity in the logs. > > > > > > Was able to shut it down tho. > > > > > > After power was restored, tried to restart the volume but altho > > > the > > > 4 peers > > > claimed to be visible and could ping each other etc: > > > ============================================== > > > Sat Oct 06 21:38:07 [0.81 0.71 0.58] > > > root@pbs2:/var/log/glusterfs/bricks > > > 567 $ gluster peer status > > > Number of Peers: 3 > > > > > > Hostname: pbs3ib > > > Uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42 > > > State: Peer in Cluster (Connected) > > > > > > Hostname: 10.255.77.2 > > > Uuid: 3fcd023c-9cc9-4d1c-84c4-babfb4492e38 > > > State: Peer in Cluster (Connected) > > > > > > Hostname: pbs4ib > > > Uuid: 2a593581-bf45-446c-8f7c-212c53297803 > > > State: Peer in Cluster (Connected) > > > ============================================== > > > > > > and the volume info seemed to be OK: > > > ============================================== > > > Sat Oct 06 21:36:11 [0.75 0.67 0.56] > > > root@pbs2:/var/log/glusterfs/bricks > > > 565 $ gluster volume info gli > > > > > > Volume Name: gli > > > Type: Distribute > > > Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4 > > > Status: Started > > > Number of Bricks: 4 > > > Transport-type: tcp,rdma > > > Bricks: > > > Brick1: pbs1ib:/bducgl > > > Brick2: pbs2ib:/bducgl > > > Brick3: pbs3ib:/bducgl > > > Brick4: pbs4ib:/bducgl > > > Options Reconfigured: > > > performance.write-behind-window-size: 1024MB > > > performance.flush-behind: on > > > performance.cache-size: 268435456 > > > nfs.disable: on > > > performance.io-thread-count: 64 > > > performance.quick-read: on > > > performance.io-cache: on > > > > > > ============================================== > > > some utilities claim that it was not started, even tho some > > > clients > > > /are > > > using the volume/ (tho there are some file oddities) > > > (from a client): > > > > > > -rw-r--r-- 1 hmangala hmangala 32935 Jun 23 2010 > > > INSTALL.txt > > > ?--------- ? ? ? ? ? R-2.15.0 > > > drwxr-xr-x 2 hmangala hmangala 18 Sep 10 14:20 bonnie/ > > > drwxr-xr-x 2 root root 18 Sep 10 13:41 bonnie2/ > > > > > > drwx------ 2 spoorkas spoorkas 8211 Jun 2 00:22 > > > QPSK_2Tx_2Rx_BH_Method2/ > > > ?--------- ? ? ? ? ? > > > QPSK_2Tx_2Rx_ML_Method1 > > > drwx------ 2 spoorkas spoorkas 8237 Jun 3 11:22 > > > QPSK_2Tx_2Rx_ML_Method2/ > > > drwx------ 2 spoorkas spoorkas 12288 Jun 4 01:24 > > > QPSK_2Tx_3Rx_BH/ > > > drwx------ 2 spoorkas spoorkas 4232 Jun 2 00:26 > > > QPSK_2Tx_3Rx_BH_Method1/ > > > drwx------ 2 spoorkas spoorkas 8274 Jun 2 00:34 > > > QPSK_2Tx_3Rx_BH_Method2/ > > > ?--------- ? ? ? ? ? > > > QPSK_2Tx_3Rx_ML_Method1 > > > ?--------- ? ? ? ? ? > > > QPSK_2Tx_3Rx_ML_Method2 > > > -rw-r--r-- 1 spoorkas spoorkas 0 Apr 17 14:16 > > > simple.sh.e1802207 > > > > > > (These files appear to be intact on the individual bricks tho.) > > > > > > ============================================== > > > Sat Oct 06 21:38:18 [0.76 0.71 0.58] > > > root@pbs2:/var/log/glusterfs/bricks > > > 568 $ gluster volume status > > > Volume gli is not started > > > ============================================== > > > > > > and since that is the case, other utilities also claim this: > > > > > > ============================================== > > > Sat Oct 06 21:41:25 [1.04 0.84 0.65] > > > root@pbs2:/var/log/glusterfs/bricks > > > 571 $ gluster volume status gli detail > > > Volume gli is not started > > > ============================================== > > > > > > And since they think it's not started, I can't stop it. > > > > > > How is this resolvable? > > -- > > Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine > > [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 > > 415 South Circle View Dr, Irvine, CA, 92697 [shipping] > > MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) > > -- > > Passive-Aggressive Supporter of the The Canada Party: > > <http://www.americabutbetter.com/> > > > > _______________________________________________ > > Gluster-users mailing list > > [email protected] > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
