Hi, On Mon, Jun 07, 2010 at 11:52:49AM +0900, Takatoshi MATSUO wrote: > Hi > > (2010/06/04 23:18), Dejan Muhamedagic wrote: > > Hi, > > > > On Fri, Jun 04, 2010 at 02:16:42PM +0200, Bernd Schubert wrote: > >> On Friday 04 June 2010, Dejan Muhamedagic wrote: > >>> Hi Takatoshi-san, > >>> > >>> On Fri, Jun 04, 2010 at 02:19:42PM +0900, Takatoshi MATSUO wrote: > >>>> Hello > >>>> > >>>> I suggest to add a parameter which decides executing fsck > >>>> as user's policy in Filesystem RA. > >>>> > >>>> Because, current RA dose not check ext3 because executing fsck depends on > >>>> filesystem. > >>>> But ext3 sometimes is broken and remounted read-only although it has > >>>> journal, so > >>> > >>> Under which circumstances does this happen? > > It happens when testing such as pulling disk cable, > crashing OS, and so on. > But I don't know detail circumstances because it's very very rare. > > Like Bernd says, no filesystem is perfect, > and no hardware and no driver are perfect too. > > >> No filesystem is perfect ;) And any kind of hardware issue can cause > >> filesystem and data corruption. > > > > Filesystem corruption? That's like not what I exactly had in mind :) > > I'm not sure if fsck would help in that case anyway. Not saying > > that that never happens (hw or bugs), but what I meant is > > "normal" (say, on stonith) failovers where no fs corruption occurs. > > > >> Takatoshi-san, you should notice however, that for example e2fsck will > >> start > >> to run in non-auto mode, even if only a journal recovery is required. With > >> default extX paramters, it then easily might perform a complete filesystem > >> check, which might last hours. Not only that you might get unexpected long > >> down time, you also need to be aware, that fsck time is often MUCH longer > >> than > >> the resource start timeout. If that happens, pacemaker will kill fsck in > >> the > >> middle of a run, which might damage your filesystem even more. > > I notice complete filesystem check, > so if I use this parameter with force, I'll disable "max-mount-counts" > and "time-last-checked" using tune2fs commands. > > But I don't notice pacemaker will kill fsck. > > >> That is all fine if you know about possible consequences, but I really > >> doubt > >> that most admins are aware of that. > > > > Most admins are not aware of most things ;-) > > > >>>> I want to decide myself executing fsck before mount to operate more > >>>> safely. > >>>> > >>>> This new parameter has three mode "auto","force" and "no". > >>>> Default is "auto" which do the same thing as before. > >>>> "force" and "no" mean what they say. > >>> > >>> Patch applied. Many thanks! > > Thanks. > > >> That brings up and idea here, with extX, we could easily use > >> > >> dumpe2fs -h | grep "Filesystem state:" > >> > >> to check if fsck needs to be run. So the agent could refuse to mount the > >> decide and make you run it manually in the foreground without any > >> timeouts... > >> I will implement that for our lustre_server agent (a heavily modified > >> Filesystem agent) and then possibly back-port the patch. > > > > That may be a good idea. Given that one can say how long would > > e2fsck take. > > It sounds good, but "dumpe2fs" is specific to ext2 and ext3. > If killing fsck damages filesystem, can I rewrite this patch > based on this idea?
It's probably a good idea not to interrupt fsck. I don't know if there is a way to figure out if e2fsck is going to do the journal recovery or a full filesystem check. Thanks, Dejan > > Cheers, > > > > Dejan > > > >> Cheers, > >> Bernd > > Regards. > Takatoshi MATSUO > > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
