Hi,

On Mon, Jun 07, 2010 at 11:52:49AM +0900, Takatoshi MATSUO wrote:
> Hi
> 
> (2010/06/04 23:18), Dejan Muhamedagic wrote:
> > Hi,
> >
> > On Fri, Jun 04, 2010 at 02:16:42PM +0200, Bernd Schubert wrote:
> >> On Friday 04 June 2010, Dejan Muhamedagic wrote:
> >>> Hi Takatoshi-san,
> >>>
> >>> On Fri, Jun 04, 2010 at 02:19:42PM +0900, Takatoshi MATSUO wrote:
> >>>> Hello
> >>>>
> >>>> I suggest to add a parameter which decides executing fsck
> >>>> as user's policy in Filesystem RA.
> >>>>
> >>>> Because, current RA dose not check ext3 because executing fsck depends on
> >>>> filesystem.
> >>>> But ext3 sometimes is broken and remounted read-only although it has
> >>>> journal, so
> >>>
> >>> Under which circumstances does this happen?
> 
> It happens when testing such as pulling disk cable,
> crashing OS, and so on.
> But I don't know detail circumstances because it's very very rare.
> 
> Like Bernd says, no filesystem is perfect,
> and no hardware and no driver are perfect too.
> 
> >> No filesystem is perfect ;) And any kind of hardware issue can cause
> >> filesystem and data corruption.
> >
> > Filesystem corruption? That's like not what I exactly had in mind :)
> > I'm not sure if fsck would help in that case anyway. Not saying
> > that that never happens (hw or bugs), but what I meant is
> > "normal" (say, on stonith) failovers where no fs corruption occurs.
> >
> >> Takatoshi-san, you should notice however, that for example e2fsck will 
> >> start
> >> to run in non-auto mode, even if only a journal recovery is required. With
> >> default extX paramters, it then easily might perform a complete filesystem
> >> check, which might last hours. Not only that you might get unexpected long
> >> down time, you also need to be aware, that fsck time is often MUCH longer 
> >> than
> >> the resource start timeout. If that happens, pacemaker will kill fsck in 
> >> the
> >> middle of a run, which might damage your filesystem even more.
> 
> I notice complete filesystem check,
> so if I use this parameter with force, I'll disable "max-mount-counts"
> and "time-last-checked" using tune2fs commands.
> 
> But I don't notice pacemaker will kill fsck.
> 
> >> That is all fine if you know about possible consequences, but I really 
> >> doubt
> >> that most admins are aware of that.
> >
> > Most admins are not aware of most things ;-)
> >
> >>>> I want to decide myself executing fsck before mount to operate more
> >>>> safely.
> >>>>
> >>>> This new parameter has three mode "auto","force" and "no".
> >>>> Default is "auto" which do the same thing as before.
> >>>> "force" and "no" mean what they say.
> >>>
> >>> Patch applied. Many thanks!
> 
> Thanks.
> 
> >> That brings up and idea here, with extX, we could easily use
> >>
> >> dumpe2fs -h | grep "Filesystem state:"
> >>
> >> to check if fsck needs to be run. So the agent could refuse to mount the
> >> decide and make you run it manually in the foreground without any 
> >> timeouts...
> >> I will implement that for our lustre_server agent (a heavily modified
> >> Filesystem agent) and then possibly back-port the patch.
> >
> > That may be a good idea. Given that one can say how long would
> > e2fsck take.
> 
> It sounds good, but "dumpe2fs" is specific to ext2 and ext3.
> If killing fsck damages filesystem, can I rewrite this patch
> based on this idea?

It's probably a good idea not to interrupt fsck. I don't know if
there is a way to figure out if e2fsck is going to do the journal
recovery or a full filesystem check.

Thanks,

Dejan

> > Cheers,
> >
> > Dejan
> >
> >> Cheers,
> >> Bernd
> 
> Regards.
> Takatoshi MATSUO
> 
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to