Re: [lustre-discuss] how does lustre handle node failure

Shawn via lustre-discuss Fri, 21 Jul 2023 15:07:59 -0700

Hi Laura,  thanks for your reply.
It seems the OSSs will share the disks created from a shared SAN.  So the
OSS-pairs can failover in a pre-defined manner if one node is down,
coordinated by a HA manager.


This can certainly work on a limited scale.  I'm curious if this static
schema can scale to a large cluster with 100s of OSSs servers?


regards,
Shawn




On Tue, Jul 18, 2023 at 1:25 PM Laura Hild <[email protected]> wrote:

> I'm not familiar with using FLR to tolerate OSS failures.  My site does
> the HA pairs with shared storage method.  It's sort of described in the
> manual
>
>   https://doc.lustre.org/lustre_manual.xhtml#configuringfailover
>
> but in more, Pacemaker-specific detail at
>
>
> https://wiki.lustre.org/Creating_a_Framework_for_High_Availability_with_Pacemaker
>
> and
>
>
> https://wiki.lustre.org/Creating_Pacemaker_Resources_for_Lustre_Storage_Services
>
>

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] how does lustre handle node failure

Reply via email to