Sorry for not responding immediately - I was drowning in flopsweat trying to 
get it back up.  After some false starts, mostly due to premature mounting of 
the inconsistent volfile.

remounting after the glusterfs came back and re-established a valid volfile  
seems to have resolved everything.

Thanks very much for the help..

hjm



On Monday, October 08, 2012 02:30:07 PM John Mark Walker wrote:
> I suspect this didn't go through - forwarding.
> 
> ----- Original Message -----
> 
> > Harry,
> > 
> > Could you paste/attach the contents of /var/lib/glusterd/gli/info
> > files and the glusterd log files from the 4 peers in cluster?
> > From the volume-info snippet you had pasted, it appears that
> > the node which was shutdown differs in its view of the volume's
> > status.
> > 
> > thanks,
> > krish
> > 
> > ----- Original Message -----
> > 
> > > From: "harry mangalam" <[email protected]>
> > > To: [email protected]
> > > Sent: Monday, October 8, 2012 2:49:05 AM
> > > Subject: Re: [Gluster-users] volume started but not 'startable',
> > > 
> > >   not 'stoppable'
> > > 
> > > And a few more data points: it appears the reason for the flaky
> > > gluster fs is
> > > that not all the servers are running glusterfsd's (see below).  Is
> > > there a way
> > > to force the servers to all start the glusterfsd's as they're
> > > supposed to?
> > > 
> > > The mystery rebalance did complete, and seems to have fixed some
> > > but
> > > not all
> > > 
> > > problem files - ie:
> > > > drwx------ 2 spoorkas spoorkas  8211 Jun  2 00:22
> > > > QPSK_2Tx_2Rx_BH_Method2/
> > > > ?--------- ? ?        ?            ?            ?
> > > > QPSK_2Tx_2Rx_ML_Method1
> > > 
> > > And the started/not started status has gotten weirder if possble..
> > > 
> > > The gluster volume is still being exported to clients, despite
> > > gluster
> > > insisting that the volume is not started (servers are pbs[1234]
> > > result of
> > > $ gluster volume status
> > > pbs1:Volume gli is not started
> > > pbs2:Volume gli is not started
> > > pbs3:Volume gli is not started
> > > pbs4:Volume gli is not started
> > > 
> > > $ gluster volume info:
> > > pbs1:Status: Stopped
> > > pbs2:Status: Started  <- aha!
> > > pbs3:Status: Started  <- aha!
> > > pbs4:Status: Started
> > > 
> > > This correlates with the glusterfsd status in which only pbs[23]
> > > are
> > > running
> > > glusterfsd:
> > > 
> > > pbs2:root      1799  0.1  0.0 184296 16464 ?        Ssl  13:07
> > > 
> > >   0:06
> > > 
> > > /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs2ib.bducgl -p
> > > /var/lib/glusterd/vols/gli/run/pbs2ib-bducgl.pid -S
> > > /tmp/c70b2f910e2fe1bb485b1d76ef63e3db.socket --brick-name /bducgl
> > > -l
> > > /var/log/glusterfs/bricks/bducgl.log --xlator-option
> > > *-posix.glusterd-
> > > uuid=26de63bd-c5b7-48ba-b81d-5d77a533d077 --brick-port 24025 24026
> > > --xlator-
> > > option gli-server.transport.rdma.listen-port=24026 --xlator-option
> > > gli-
> > > server.listen-port=24025
> > > 
> > > pbs3:root      1751  0.1  0.0 184168 16468 ?        Ssl  13:07
> > > 
> > >   0:06
> > > 
> > > /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs3ib.bducgl -p
> > > /var/lib/glusterd/vols/gli/run/pbs3ib-bducgl.pid -S
> > > /tmp/7096377992feb7f5a7805cafd82c3100.socket --brick-name /bducgl
> > > -l
> > > /var/log/glusterfs/bricks/bducgl.log --xlator-option
> > > *-posix.glusterd-
> > > uuid=c79c4084-d6b9-4af9-b975-40dd6aa99b42 --brick-port 24018 24020
> > > --xlator-
> > > option gli-server.transport.rdma.listen-port=24020 --xlator-option
> > > gli-
> > > server.listen-port=24018
> > > 
> > > pbs[14] are only running the glusterd process, not any glusterfsd's
> > > 
> > > In previous startups, pbs4 WAS running a glusterfsd, but pbs1 has
> > > not
> > > run one
> > > since the powerdown AFAIK.
> > > 
> > > hjm
> > > 
> > > On Saturday, October 06, 2012 10:19:14 PM harry mangalam wrote:
> > > > ...and should have added:
> > > > 
> > > > the rebalance log (the volume claimed to be rebalancing before I
> > > > shut it
> > > > down but was idle or wedged at that time) is active as well with
> > > > about 1
> > > > warning of a "1 subvolumes down -- not fixing" for every 3
> > > > informational
> > > > 
> > > > messages:
> > > >  2012-10-06 22:05:35.396650] I
> > > >  [dht-rebalance.c:1058:gf_defrag_migrate_data]
> > > > 
> > > > 0-gli-dht: migrate data called on /nlduong/nduong2-t-
> > > > illiac/workspace/m5_sim/trunk/src/arch/.svn/tmp/wcprops
> > > > 
> > > > [2012-10-06 22:05:35.451925] I
> > > > [dht-layout.c:593:dht_layout_normalize]
> > > > 0-gli- dht: found anomalies in /nlduong/nduong2-t-
> > > > illiac/workspace/m5_sim/trunk/src/arch/.svn/wcprops. holes=1
> > > > overlaps=0
> > > > 
> > > > [2012-10-06 22:05:35.451957] W
> > > > [dht-selfheal.c:875:dht_selfheal_directory]
> > > > 0- gli-dht: 1 subvolumes down -- not fixing
> > > > 
> > > > 
> > > > previously...
> > > > 
> > > > gluster 3.3, running on ubuntu 10.04, was running OK, had to shut
> > > > down for a
> > > > power outage.
> > > > 
> > > > When I tried to shut it down, it insisted that it was
> > > > rebalancing,
> > > > but
> > > > seeemed wedged - no activity in the logs.
> > > > 
> > > > Was able to shut it down tho.
> > > > 
> > > > After power was restored, tried to restart the volume but altho
> > > > the
> > > > 4 peers
> > > > claimed to be visible and could ping each other etc:
> > > > ==============================================
> > > > Sat Oct 06 21:38:07 [0.81 0.71 0.58]
> > > > 
> > > >  root@pbs2:/var/log/glusterfs/bricks
> > > > 
> > > > 567 $ gluster peer status
> > > > Number of Peers: 3
> > > > 
> > > > Hostname: pbs3ib
> > > > Uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42
> > > > State: Peer in Cluster (Connected)
> > > > 
> > > > Hostname: 10.255.77.2
> > > > Uuid: 3fcd023c-9cc9-4d1c-84c4-babfb4492e38
> > > > State: Peer in Cluster (Connected)
> > > > 
> > > > Hostname: pbs4ib
> > > > Uuid: 2a593581-bf45-446c-8f7c-212c53297803
> > > > State: Peer in Cluster (Connected)
> > > > ==============================================
> > > > 
> > > > and the volume info seemed to be OK:
> > > > ==============================================
> > > > Sat Oct 06 21:36:11 [0.75 0.67 0.56]
> > > > 
> > > >  root@pbs2:/var/log/glusterfs/bricks
> > > > 
> > > > 565 $ gluster volume info gli
> > > > 
> > > > Volume Name: gli
> > > > Type: Distribute
> > > > Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4
> > > > Status: Started
> > > > Number of Bricks: 4
> > > > Transport-type: tcp,rdma
> > > > Bricks:
> > > > Brick1: pbs1ib:/bducgl
> > > > Brick2: pbs2ib:/bducgl
> > > > Brick3: pbs3ib:/bducgl
> > > > Brick4: pbs4ib:/bducgl
> > > > Options Reconfigured:
> > > > performance.write-behind-window-size: 1024MB
> > > > performance.flush-behind: on
> > > > performance.cache-size: 268435456
> > > > nfs.disable: on
> > > > performance.io-thread-count: 64
> > > > performance.quick-read: on
> > > > performance.io-cache: on
> > > > 
> > > > ==============================================
> > > > some utilities claim that it was not started, even tho some
> > > > clients
> > > > /are
> > > > using the volume/ (tho there are some file oddities)
> > > > (from a client):
> > > > 
> > > > -rw-r--r-- 1 hmangala hmangala       32935 Jun 23  2010
> > > > INSTALL.txt
> > > > ?--------- ? ?        ?                  ?            ? R-2.15.0
> > > > drwxr-xr-x 2 hmangala hmangala          18 Sep 10 14:20 bonnie/
> > > > drwxr-xr-x 2 root     root              18 Sep 10 13:41 bonnie2/
> > > > 
> > > > drwx------ 2 spoorkas spoorkas  8211 Jun  2 00:22
> > > > QPSK_2Tx_2Rx_BH_Method2/
> > > > ?--------- ? ?        ?            ?            ?
> > > > QPSK_2Tx_2Rx_ML_Method1
> > > > drwx------ 2 spoorkas spoorkas  8237 Jun  3 11:22
> > > > QPSK_2Tx_2Rx_ML_Method2/
> > > > drwx------ 2 spoorkas spoorkas 12288 Jun  4 01:24
> > > > QPSK_2Tx_3Rx_BH/
> > > > drwx------ 2 spoorkas spoorkas  4232 Jun  2 00:26
> > > > QPSK_2Tx_3Rx_BH_Method1/
> > > > drwx------ 2 spoorkas spoorkas  8274 Jun  2 00:34
> > > > QPSK_2Tx_3Rx_BH_Method2/
> > > > ?--------- ? ?        ?            ?            ?
> > > > QPSK_2Tx_3Rx_ML_Method1
> > > > ?--------- ? ?        ?            ?            ?
> > > > QPSK_2Tx_3Rx_ML_Method2
> > > > -rw-r--r-- 1 spoorkas spoorkas     0 Apr 17 14:16
> > > > simple.sh.e1802207
> > > > 
> > > > (These files appear to be intact on the individual bricks tho.)
> > > > 
> > > > ==============================================
> > > > Sat Oct 06 21:38:18 [0.76 0.71 0.58]
> > > > 
> > > >  root@pbs2:/var/log/glusterfs/bricks
> > > > 
> > > > 568 $ gluster volume status
> > > > Volume gli is not started
> > > > ==============================================
> > > > 
> > > > and since that is the case, other utilities also claim this:
> > > > 
> > > > ==============================================
> > > > Sat Oct 06 21:41:25 [1.04 0.84 0.65]
> > > > 
> > > >  root@pbs2:/var/log/glusterfs/bricks
> > > > 
> > > > 571 $ gluster volume status gli detail
> > > > Volume gli is not started
> > > > ==============================================
> > > > 
> > > > And since they think it's not started, I can't stop it.
> > > > 
> > > > How is this resolvable?
> > > 
> > > --
> > > Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
> > > [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
> > > 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
> > > MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
> > > --
> > > 
> > > Passive-Aggressive Supporter of the The Canada Party:
> > >   <http://www.americabutbetter.com/>
> > > 
> > > _______________________________________________
> > > Gluster-users mailing list
> > > [email protected]
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > [email protected]
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
Passive-Aggressive Supporter of the The Canada Party:
  <http://www.americabutbetter.com/>

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to