I would think all disk systems would suffer this type of problem. OCFS2 has this problem check_ocfs2 as well as more clustered disk problems. Switching to OCFS2 is not going to make your life easier.
On 9/6/07, Igor D'Astolfo <[EMAIL PROTECTED]> wrote: > > > > Hi, > > > > On Thu, Sep 06, 2007 at 12:24:14PM +0200, Igor D'Astolfo wrote: > > >/ Hi, > > />/ I'm using linux-ha to put MySQL in high availability. > > />/ I configured 2 nodes with MySQL in HA, with 3 resources in a group > > />/ colocated and ordered: > > />/ > > />/ * the ip bound to the service > > />/ * the partition with data (on a shared storage), formatted with reiserfs > > />/ * the mysql service > > />/ > > />/ The ha works well, I can migrate the service between the nodes without > > />/ problems. > > />/ But yesterday I had a big issue: the node that was running the resource > > />/ group went down for a power loss and left the data partition unclean. > > />/ > > />/ After the default timeouts, the other node took over the resources and > > />/ restarted the service. BUT the partition was not clean. This wasn't > > />/ evident to me, so the server continued to work for about two hours and > > />/ then the filesystem started to give kernel ops on the fs and mysql > > />/ stopped responding. > > />/ I had to unmount the partition, make a fsck.reiserfs --rebuild-tree, > > />/ remount the partition and restore from backup some files that were lost > > />/ in the correction. > > />/ > > />/ My question is if it's possible to make a check on the partition before > > />/ mounting it on the other node or if there's another way to configure the > > />/ partition to avoid such problems. > > / > > This is arguably a case of software failing in an unexpected way. > > Journaled filesystems should guarantee integrity of data and > > metadata. That's why one uses them. And to avoid very time > > consuming filesystem check procedures on boot. Unfortunately, > > there is usually no quick way to find out if the filesystem is > > good. > > > > Otherwise, it is of course possible to do a filesystem check > > before mounting it. But it will cost time. And it would make the > > startup procedure heavily dependent on the filesystem size and > > its nature. Sometimes, it could even last for hours. The timeouts > > would be really tricky to estimate. At any rate, perhaps this > > could be made an option and then left to the user to decide if > > their filesystem needs extra checking on mount. > > > I agree with you, the check shouldn't be done automatically, but there > could be a check on the cause of the switch of the resource. > Eg. if the resource is switching node because I issued a migration it's > not necessary to check it, but if the switch is caused by a node lock > down it could be the case to force a check or to make the resource > stopped until user intervention. > > So, at the moment this problem (using a reiserfs filesystem) has no > solution. Is there a way to avoid this using other filesystems (OCFS??? )? > > Regards. > > Dejan > > > > >/ Regards > > />/ > > />/ _______________________________________________ > > />/ Linux-HA mailing list > > />/ Linux-HA at lists.linux-ha.org > > <http://lists.linux-ha.org/mailman/listinfo/linux-ha> > > />/ http://lists.linux-ha.org/mailman/listinfo/linux-ha > > />/ See also: http://linux-ha.org/ReportingProblems > > / > > > -- > *SMART./it/* > *Igor D'Astolfo* > Sistemi e Software > > [EMAIL PROTECTED] Via Roma, 85 - Viadagola > 40057 Granarolo Emilia - Bologna > Tel. 051.6056850 - Fax 051.6066196 > www.smart.it - [EMAIL PROTECTED] > > *Smart./it/* realizza servizi via web per l'innovazione d'impresa, allo > scopo di ottimizzare i processi aziendali, ridurre i costi e migliorare > la qualità. In particolare Smart.it progetta, sviluppa e gestisce nel > tempo applicativi software su Internet, sia sul versante funzionale che > comunicativo. Cura inoltre la realizzazione grafica, editoriale e > tecnica di siti web, con i relativi servizi di hosting e di web marketing. > / > Il contenuto di questo messaggio è strettamente riservato al > destinatario suindicato. Qualora aveste ricevuto il messaggio per > errore, siete pregati di darcene comunicazione ed eliminarlo (allegati > compresi) senza farne copia. La diffusione o comunicazione e > riproduzione in qualunque modo eseguite del messaggio ricevuto per > errore sono vietate. > This e-mail transmission may contain legally privileged and/or > confidential information. If you have received this e-mail erroneously, > please notify the sender and delete the original transmission > attachments without reading or saving it at any rate. Any use, > distribution, reproduction or disclosure by any other person is strictly > forbidden./ > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
