Hi all, We have build a lustre cluster server environment on CentOS7 and lustre 2.12.7 The clients are using 2.12.5 The setup is 3 clusters for a 3PB filesystem One cluster is a two node cluster built for MGS and MDT's The other two clusters are also two node cluster used for the OST's The cluster framework is working as expected.
The servers are connected in a multirail network, because some clients are in IB and the other clients are on ethernet But we have the following problem. When an OST failover to the second node the clients are unable to contact the OST that is started on the oder node. The OST recovery status is waiting for clients When we fail it back it starts working again and the recovery status is compple We tried to abort the recovery but that does not work. We used these documents to build the cluster: https://wiki.lustre.org/Creating_the_Lustre_Management_Service_(MGS) https://wiki.lustre.org/Creating_the_Lustre_Metadata_Service_(MDS) https://wiki.lustre.org/Creating_Lustre_Object_Storage_Services_(OSS) https://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services I'm not sure what the next steps must be to find the problem and where to look. Best regards Koos Meijering ........................................................................ HPC Team Rijksuniversiteit Groningen ........................................................................
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
