Hello,
I'm running an HA cluster setup with drbd/heartbeat/nfs on ubuntu lucid
servers. I've noticed that
during failover fsck is running on the partition:
Mar 4 16:04:49 xxxxxxxx ResourceManager[21113]: info: Acquiring resource
group: yyyyyyyy drbddisk::r0 Filesystem::/dev/drbd0::/ha::ext4 nfs-kernel-server
Mar 4 16:04:49 xxxxxxxx ResourceManager[21113]: info: Running
/etc/ha.d/resource.d/drbddisk r0 start
Mar 4 16:04:49 xxxxxxxx kernel: [65679.562157] block drbd0: role( Secondary ->
Primary )
Mar 4 16:04:49 xxxxxxxx Filesystem[21180]: INFO: Resource is stopped
Mar 4 16:04:49 xxxxxxxx ResourceManager[21113]: info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha ext4 start
Mar 4 16:04:49 xxxxxxxx Filesystem[21266]: INFO: Running start for /dev/drbd0
on /ha
Mar 4 16:04:49 xxxxxxxx Filesystem[21266]: INFO: Starting filesystem check on
/dev/drbd0
This is causing a huge delay in the failover coming online, even to the point
where the primary comes
back online while the secondary is still trying to check the volume.
My understanding is that an fsck shouldn't be needed with the way drbd works
and I had set the
tune2fs options on the drbd partition to prevent fsck on mount using 'tune2fs
-c -1 -i 0 /dev/drbd0'
so I was surprised to see this.
I've track down the invocation to: /usr/lib/ocf/resource.d/heartbeat/Filesystem
in this section of code:
# Check the filesystem & auto repair.
# NOTE: Some filesystem types don't need this step... Please modify
# accordingly
if [ $blockdevice = "yes" ]; then
if [ "$DEVICE" != "/dev/null" -a ! -b "$DEVICE" ] ; then
ocf_log err "Couldn't find device [$DEVICE]. Expected
/dev/??? to exist"
exit $OCF_ERR_INSTALLED
fi
if
case $FSTYPE in
ext3|reiserfs|reiser4|nss|xfs|jfs|vfat|fat|nfs|cifs|smbfs|ocfs2|gfs2|none|lustre)
false;;
*)
true;;
esac
then
ocf_log info "Starting filesystem check on $DEVICE"
if [ -z "$FSTYPE" ]; then
$FSCK -p $DEVICE
else
$FSCK -t $FSTYPE -p $DEVICE
fi
# NOTE: if any errors at all are detected, it returns
non-zero
# if the error is >= 4 then there is a big problem
if [ $? -ge 4 ]; then
ocf_log err "Couldn't sucessfully fsck
filesystem for $DEVICE"
return $OCF_ERR_GENERIC
fi
fi
fi
It looks like the fsck is being bypased for most filesystem times, but since we
are using ext4 that bypass is not happening.
Is ext4 excluded for a reason? I'd like to add ext4 to the list of fstypes
that get bypassed but I'm wondering if there is a
reason that ext4 is excluded. Are there any dangers to adding ext4 to the
bypass list?
Thanks for the help.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems