Re: [Linux-HA] How to handle filesystem corruption

Max Hofer Tue, 11 Sep 2007 16:20:32 -0700

I think there is not automated way to solve this solution if you implement the 
shared file system for the poor man (drbd etc.). Remember HA does not mean CA 
(continusous availablity).


The only solution i can think of is:
- make a RA which mounts the file system with a property to recover from 
failed mounts (i think the OCF specification has such an action)
- in this recover action try to perform an automated, non interactive 
checkdisk (be aware that this may corrupt your data more than if you repair 
it manually!!)

All HA cluster should have a RA which informs via SMS/Email or whatever means 
an admin when the group goes down for X minutes. In this way someone can act 
on the down time.

On Thursday 06 September 2007, Igor D'Astolfo wrote:
> Hi,
>     I'm using linux-ha to put MySQL in high availability.
> I configured 2 nodes with MySQL in HA, with 3 resources in a group
> colocated and ordered:
>
> * the ip bound to the service
> * the partition with data (on a shared storage), formatted with reiserfs
> * the mysql service
>
> The ha works well, I can migrate the service between the nodes without
> problems.
> But yesterday I had a big issue: the node that was running the resource
> group went down for a power loss and left the data partition unclean.
>
> After the default timeouts, the other node took over the resources and
> restarted the service. BUT the partition was not clean. This wasn't
> evident to me, so the server continued to work for about two hours and
> then the filesystem started to give kernel ops on the fs and mysql
> stopped responding.
> I had to unmount the partition, make a fsck.reiserfs --rebuild-tree,
> remount the partition and restore from backup some files that were lost
> in the correction.
>
> My question is if it's possible to make a check on the partition before
> mounting it on the other node or if there's another way to configure the
> partition to avoid such problems.
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to handle filesystem corruption

Reply via email to