I have 2 MDS, configured as an active/standby pair.  I have 5 OSTs that
are NOT active/standby.  I
have 5 clients.
 
I am using Lustre 1.6.5, due to bug 18232
<https://bugzilla.lustre.org/show_bug.cgi?id=18232>  which only affects
1.6.6.  Using Lustre 1.6.5, when I
reset my active node, the standby takes over.  This is quite reliable.
 
Today, I did the following in this order:
  Unmounted all the clients
  Rebooted all the clients
  Stopped Linux HA from running
  Unmounted the OSTs
  Unmounted the MDS
  Rebooted the OSTs
  Rebooted both MDSes
 
When the MDSes started up, Linux HA chose one to be active.  That system
mounted the MDT.
 
I looked at the file  /proc/fs/lustre/mds/tacc-MDT0000/recovery_status,
and it showed:
 
[r...@ts-tacc-01 ~]# cat
/proc/fs/lustre/mds/tacc-MDT0000/recovery_status 
status: RECOVERING
recovery_start: 0
time_remaining: 0
connected_clients: 0/5
completed_clients: 0/5
replayed_requests: 0/??
queued_requests: 0
next_transno: 17768
 
 
***** Note that recovery_start and time_remaining are both zero. *****
 
I waited a several minutes, and this file was the same.
 
I was waiting for recovery to complete before trying to mount the OSTs.
However, it appears that
this would never occur!
 
Does this look like a bug? 
 
---------------------------
 
I format my MDT using the following command.  The command is run from
10.2.43.1, and the failnode
is 10.2.43.2:
 
mkfs.lustre --reformat --fsname tacc --mdt --mgs --device-size=10000000
--mkfsoptions=' -m 0 -O
mmp' --failnode=10.2.4...@o2ib0 /dev/sdb
 
I format the OSTs using the following command:
 
/usr/bin/time -p mkfs.lustre --reformat --ost --mkfsoptions='-J
device=/dev/sdc1 -m 0' --fsname
tacc --device-size=400000000 --mgsnode=10.2.4...@o2ib0
--mgsnode=10.2.4...@o2ib0 /dev/sdb
 
I mount the clients using:
 
mount -t lustre 10.2.4...@o2ib:10.2.4...@o2ib:/tacc /mnt/lustre

 

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to