Is the purpose of the failover OST to provide write and read access or just read? I set up 2 OSTs with 2 failovers. If I pull the network cable from one of the OSTs, I can read from the Lustre filesystem, but not write to it.
I tried several experiments this afternoon with the same 4 OSTs. If I make the 4 OSTs failovers for each other, i.e A->B, B->A, I could read from the Lustre fileystem and CLI commands like ls worked fine. However, any disk command like df would hang that console until whichever OST I took down came back online. What is the purpose of having a failover node if you can't write to the designated failover node when one of the OSTs is unavailable? This is when I take one down, as you can see, it sees the node is down, then switches to the failover node: Lustre: alamofs-OST0001-osc-f6aa9600: Connection to service alamofs-OST0001 via nid [EMAIL PROTECTED] was lost; in progress operations using this service will wait for recovery to complete. Lustre: Changing connection for alamofs-OST0001-osc to [EMAIL PROTECTED]/[EMAIL PROTECTED] What I used to format the OSTs: mkfs.lustre --fsname=alamofs --ost --failnode=compute-0-8 [EMAIL PROTECTED] /dev/md0 -- Jeremy Mann [EMAIL PROTECTED] University of Texas Health Science Center Bioinformatics Core Facility http://www.bioinformatics.uthscsa.edu Phone: (210) 567-2672 _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
