Re: [Lustre-discuss] Recovery fails if clients not connected

Klaus Steden Wed, 21 Jan 2009 11:24:15 -0800

Hi Roger,

I believe you can connect the OSSs once the MDS has booted, and in fact, I¹m
pretty sure that the five in the connected_clients: 0/5¹ are in fact your
OSS nodes. Each OST maintains a connection to the MDS while the file system
is mounted, so they will be included in the connection count on the MDS.


However, regardless of the state  if your MDS is online and the MDT is
mounted, you can start up the OSS nodes and corresponding OSTs at any time;
clients attempting to make transactions will have their I/O operations block
(or fail, depending on the MDS config) until the missing nodes come back
online.

hth,
Klaus


On 1/20/09 3:05 PM, "Roger Spellman" <[email protected]> etched on stone
tablets:

> I have 2 MDS, configured as an active/standby pair.  I have 5 OSTs that are
> NOT active/standby.  I
> have 5 clients.
>  
> I am using Lustre 1.6.5, due to bug 18232
> <https://bugzilla.lustre.org/show_bug.cgi?id=18232>  which only affects 1.6.6.
> Using Lustre 1.6.5, when I
> reset my active node, the standby takes over.  This is quite reliable.
>  
> Today, I did the following in this order:
>   Unmounted all the clients
>   Rebooted all the clients
>   Stopped Linux HA from running
>   Unmounted the OSTs
>   Unmounted the MDS
>   Rebooted the OSTs
>   Rebooted both MDSes
>  
> When the MDSes started up, Linux HA chose one to be active.  That system
> mounted the MDT.
>  
> I looked at the file  /proc/fs/lustre/mds/tacc-MDT0000/recovery_status, and it
> showed:
>  
> [r...@ts-tacc-01 ~]# cat /proc/fs/lustre/mds/tacc-MDT0000/recovery_status
> status: RECOVERING
> recovery_start: 0
> time_remaining: 0
> connected_clients: 0/5
> completed_clients: 0/5
> replayed_requests: 0/??
> queued_requests: 0
> next_transno: 17768
>  
>  
> ***** Note that recovery_start and time_remaining are both zero. *****
>  
> I waited a several minutes, and this file was the same.
>  
> I was waiting for recovery to complete before trying to mount the OSTs.
> However, it appears that
> this would never occur!
>  
> Does this look like a bug?
>  
> ---------------------------
>  
> I format my MDT using the following command.  The command is run from
> 10.2.43.1, and the failnode
> is 10.2.43.2:
>  
> mkfs.lustre --reformat --fsname tacc --mdt --mgs --device-size=10000000
> --mkfsoptions=' -m 0 -O
> mmp' --failnode=10.2.4...@o2ib0 /dev/sdb
>  
> I format the OSTs using the following command:
>  
> /usr/bin/time -p mkfs.lustre --reformat --ost --mkfsoptions='-J
> device=/dev/sdc1 -m 0' --fsname
> tacc --device-size=400000000 --mgsnode=10.2.4...@o2ib0
> --mgsnode=10.2.4...@o2ib0 /dev/sdb
>  
> I mount the clients using:
>  
> mount -t lustre 10.2.4...@o2ib:10.2.4...@o2ib:/tacc /mnt/lustre
>  
> 
> 
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Recovery fails if clients not connected

Reply via email to