Re: [Gluster-users] volume started but not 'startable', not 'stoppable'

harry mangalam Mon, 08 Oct 2012 11:25:24 -0700

And a few more data points: it appears the reason for the flaky gluster fs is 
that not all the servers are running glusterfsd's (see below).  Is there a way 
to force the servers to all start the glusterfsd's as they're supposed to?


The mystery rebalance did complete, and seems to have fixed some but not all 
problem files - ie:

> drwx------ 2 spoorkas spoorkas  8211 Jun  2 00:22 QPSK_2Tx_2Rx_BH_Method2/
> ?--------- ? ?        ?            ?            ? QPSK_2Tx_2Rx_ML_Method1

And the started/not started status has gotten weirder if possble..

The gluster volume is still being exported to clients, despite gluster 
insisting that the volume is not started (servers are pbs[1234]
result of 
$ gluster volume status
pbs1:Volume gli is not started
pbs2:Volume gli is not started
pbs3:Volume gli is not started
pbs4:Volume gli is not started

$ gluster volume info:
pbs1:Status: Stopped
pbs2:Status: Started  <- aha!
pbs3:Status: Started  <- aha!
pbs4:Status: Started

This correlates with the glusterfsd status in which only pbs[23] are running 
glusterfsd:

pbs2:root      1799  0.1  0.0 184296 16464 ?        Ssl  13:07   0:06 
/usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs2ib.bducgl -p 
/var/lib/glusterd/vols/gli/run/pbs2ib-bducgl.pid -S 
/tmp/c70b2f910e2fe1bb485b1d76ef63e3db.socket --brick-name /bducgl -l 
/var/log/glusterfs/bricks/bducgl.log --xlator-option *-posix.glusterd-
uuid=26de63bd-c5b7-48ba-b81d-5d77a533d077 --brick-port 24025 24026 --xlator-
option gli-server.transport.rdma.listen-port=24026 --xlator-option gli-
server.listen-port=24025

pbs3:root      1751  0.1  0.0 184168 16468 ?        Ssl  13:07   0:06 
/usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs3ib.bducgl -p 
/var/lib/glusterd/vols/gli/run/pbs3ib-bducgl.pid -S 
/tmp/7096377992feb7f5a7805cafd82c3100.socket --brick-name /bducgl -l 
/var/log/glusterfs/bricks/bducgl.log --xlator-option *-posix.glusterd-
uuid=c79c4084-d6b9-4af9-b975-40dd6aa99b42 --brick-port 24018 24020 --xlator-
option gli-server.transport.rdma.listen-port=24020 --xlator-option gli-
server.listen-port=24018

pbs[14] are only running the glusterd process, not any glusterfsd's

In previous startups, pbs4 WAS running a glusterfsd, but pbs1 has not run one 
since the powerdown AFAIK.

hjm


On Saturday, October 06, 2012 10:19:14 PM harry mangalam wrote:
> ...and should have added:
> 
> the rebalance log (the volume claimed to be rebalancing before I shut it
> down but was idle or wedged at that time) is active as well with about 1
> warning of a "1 subvolumes down -- not fixing" for every 3 informational
> messages:
> 
>  2012-10-06 22:05:35.396650] I [dht-rebalance.c:1058:gf_defrag_migrate_data]
> 0-gli-dht: migrate data called on /nlduong/nduong2-t-
> illiac/workspace/m5_sim/trunk/src/arch/.svn/tmp/wcprops
> 
> [2012-10-06 22:05:35.451925] I [dht-layout.c:593:dht_layout_normalize]
> 0-gli- dht: found anomalies in /nlduong/nduong2-t-
> illiac/workspace/m5_sim/trunk/src/arch/.svn/wcprops. holes=1 overlaps=0
> 
> [2012-10-06 22:05:35.451957] W [dht-selfheal.c:875:dht_selfheal_directory]
> 0- gli-dht: 1 subvolumes down -- not fixing
> 
> 
> previously...
> 
> gluster 3.3, running on ubuntu 10.04, was running OK, had to shut down for a
> power outage.
> 
> When I tried to shut it down, it insisted that it was rebalancing, but
> seeemed wedged - no activity in the logs.
> 
> Was able to shut it down tho.
> 
> After power was restored, tried to restart the volume but altho the 4 peers
> claimed to be visible and could ping each other etc:
> ==============================================
> Sat Oct 06 21:38:07 [0.81 0.71 0.58]  root@pbs2:/var/log/glusterfs/bricks
> 567 $ gluster peer status
> Number of Peers: 3
> 
> Hostname: pbs3ib
> Uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42
> State: Peer in Cluster (Connected)
> 
> Hostname: 10.255.77.2
> Uuid: 3fcd023c-9cc9-4d1c-84c4-babfb4492e38
> State: Peer in Cluster (Connected)
> 
> Hostname: pbs4ib
> Uuid: 2a593581-bf45-446c-8f7c-212c53297803
> State: Peer in Cluster (Connected)
> ==============================================
> 
> and the volume info seemed to be OK:
> ==============================================
> Sat Oct 06 21:36:11 [0.75 0.67 0.56]  root@pbs2:/var/log/glusterfs/bricks
> 565 $ gluster volume info gli
> 
> Volume Name: gli
> Type: Distribute
> Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4
> Status: Started
> Number of Bricks: 4
> Transport-type: tcp,rdma
> Bricks:
> Brick1: pbs1ib:/bducgl
> Brick2: pbs2ib:/bducgl
> Brick3: pbs3ib:/bducgl
> Brick4: pbs4ib:/bducgl
> Options Reconfigured:
> performance.write-behind-window-size: 1024MB
> performance.flush-behind: on
> performance.cache-size: 268435456
> nfs.disable: on
> performance.io-thread-count: 64
> performance.quick-read: on
> performance.io-cache: on
> 
> ==============================================
> some utilities claim that it was not started, even tho some clients /are
> using the volume/ (tho there are some file oddities)
> (from a client):
> 
> -rw-r--r-- 1 hmangala hmangala       32935 Jun 23  2010 INSTALL.txt
> ?--------- ? ?        ?                  ?            ? R-2.15.0
> drwxr-xr-x 2 hmangala hmangala          18 Sep 10 14:20 bonnie/
> drwxr-xr-x 2 root     root              18 Sep 10 13:41 bonnie2/
> 
> drwx------ 2 spoorkas spoorkas  8211 Jun  2 00:22 QPSK_2Tx_2Rx_BH_Method2/
> ?--------- ? ?        ?            ?            ? QPSK_2Tx_2Rx_ML_Method1
> drwx------ 2 spoorkas spoorkas  8237 Jun  3 11:22 QPSK_2Tx_2Rx_ML_Method2/
> drwx------ 2 spoorkas spoorkas 12288 Jun  4 01:24 QPSK_2Tx_3Rx_BH/
> drwx------ 2 spoorkas spoorkas  4232 Jun  2 00:26 QPSK_2Tx_3Rx_BH_Method1/
> drwx------ 2 spoorkas spoorkas  8274 Jun  2 00:34 QPSK_2Tx_3Rx_BH_Method2/
> ?--------- ? ?        ?            ?            ? QPSK_2Tx_3Rx_ML_Method1
> ?--------- ? ?        ?            ?            ? QPSK_2Tx_3Rx_ML_Method2
> -rw-r--r-- 1 spoorkas spoorkas     0 Apr 17 14:16 simple.sh.e1802207
> 
> (These files appear to be intact on the individual bricks tho.)
> 
> ==============================================
> Sat Oct 06 21:38:18 [0.76 0.71 0.58]  root@pbs2:/var/log/glusterfs/bricks
> 568 $ gluster volume status
> Volume gli is not started
> ==============================================
> 
> and since that is the case, other utilities also claim this:
> 
> ==============================================
> Sat Oct 06 21:41:25 [1.04 0.84 0.65]  root@pbs2:/var/log/glusterfs/bricks
> 571 $ gluster volume status gli detail
> Volume gli is not started
> ==============================================
> 
> And since they think it's not started, I can't stop it.
> 
> How is this resolvable?
-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
Passive-Aggressive Supporter of the The Canada Party:
  <http://www.americabutbetter.com/>

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] volume started but not 'startable', not 'stoppable'

Reply via email to