heartbeat

Brian Hirt Sat, 12 Mar 2011 11:49:52 -0800

Hello,

I'm running an HA cluster setup with drbd/heartbeat/nfs on ubuntu lucid 
servers.   I've noticed that 
during failover fsck is running on the partition:


Mar  4 16:04:49 xxxxxxxx ResourceManager[21113]: info: Acquiring resource 
group: yyyyyyyy drbddisk::r0 Filesystem::/dev/drbd0::/ha::ext4 nfs-kernel-server
Mar  4 16:04:49 xxxxxxxx ResourceManager[21113]: info: Running 
/etc/ha.d/resource.d/drbddisk r0 start
Mar  4 16:04:49 xxxxxxxx kernel: [65679.562157] block drbd0: role( Secondary -> 
Primary ) 
Mar  4 16:04:49 xxxxxxxx Filesystem[21180]: INFO:  Resource is stopped
Mar  4 16:04:49 xxxxxxxx ResourceManager[21113]: info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha ext4 start
Mar  4 16:04:49 xxxxxxxx Filesystem[21266]: INFO: Running start for /dev/drbd0 
on /ha
Mar  4 16:04:49 xxxxxxxx Filesystem[21266]: INFO: Starting filesystem check on 
/dev/drbd0

This is causing a huge delay in the failover coming online, even to the point 
where the primary comes 
back online while the secondary is still trying to check the volume. 

My understanding is that an fsck shouldn't be needed with the way drbd works 
and I had set the
tune2fs options on the drbd partition to prevent fsck on mount using 'tune2fs 
-c -1 -i 0 /dev/drbd0' 
so I was surprised to see this. 

I've track down the invocation to: /usr/lib/ocf/resource.d/heartbeat/Filesystem 
in this section of code:

        # Check the filesystem & auto repair.  
        # NOTE: Some filesystem types don't need this step...  Please modify
        #       accordingly

        if [ $blockdevice = "yes" ]; then
                if [ "$DEVICE" != "/dev/null" -a ! -b "$DEVICE" ] ; then
                        ocf_log err "Couldn't find device [$DEVICE]. Expected 
/dev/??? to exist"
                        exit $OCF_ERR_INSTALLED
                fi

                if
                  case $FSTYPE in
                    
ext3|reiserfs|reiser4|nss|xfs|jfs|vfat|fat|nfs|cifs|smbfs|ocfs2|gfs2|none|lustre)
   false;;
                    *)                                                          
        true;;
                  esac
                then
                        ocf_log info  "Starting filesystem check on $DEVICE"
                        if [ -z "$FSTYPE" ]; then
                                $FSCK -p $DEVICE
                        else
                                $FSCK -t $FSTYPE -p $DEVICE
                        fi

                        # NOTE: if any errors at all are detected, it returns 
non-zero
                        # if the error is >= 4 then there is a big problem
                        if [ $? -ge 4 ]; then
                                ocf_log err "Couldn't sucessfully fsck 
filesystem for $DEVICE"
                                return $OCF_ERR_GENERIC 
                        fi      
                fi
        fi


It looks like the fsck is being bypased for most filesystem times, but since we 
are using ext4 that bypass is not happening.
Is ext4 excluded for a reason?   I'd like to add ext4 to the list of fstypes 
that get bypassed but I'm wondering if there is a 
reason that ext4 is excluded.   Are there any dangers to adding ext4 to the 
bypass list?

Thanks for the help.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Question about ext4/drbd/heartbeat

Reply via email to