Hi,

On Mon, Mar 14, 2011 at 09:39:58AM +0100, Ulrich Windl wrote:
> >>> Dejan Muhamedagic <[email protected]> schrieb am 21.02.2011 um 17:43 in
> Nachricht <20110221164331.GA3603@squib>:
> > Hi,
> > 
> > On Fri, Feb 18, 2011 at 11:56:49AM -0500, Tony Nelson wrote:
> > > Hi All,
> > > 
> > > I have a small cluster configured like this:
> > > 
> > > [-------------- config -----------------]
> > > root@ihdb2:~# crm configure show
> > > node $id="3888bf0f-3e06-4ad8-a2c2-297451128d3d" ihdb1
> > > node $id="a1f70384-6684-47e6-ba00-ed082dee7a56" ihdb2
> > > primitive bacula-fd lsb:bacula-fd.local \
> > >   meta target-role="Started"
> > > primitive dbip ocf:heartbeat:IPaddr2 \
> > >   params ip="192.168.44.22" nic="eth0" \
> > >   op start interval="0" timeout="120s" \
> > >   op monitor interval="30s" timeout="20s"
> > > primitive fs0 ocf:heartbeat:Filesystem \
> > >   params fstype="ext3" directory="/var/lib/postgresql" 
> > device="/dev/vg01/postgresql" options="noatime" \
> > >   op start interval="0" timeout="60s" \
> > >   op stop interval="0" timeout="60s" \
> > >   meta target-role="Started"
> > > primitive iscsi ocf:heartbeat:iscsi \
> > >   params portal="192.168.43.28" 
> > target="iqn.2001-05.com.equallogic:0-8a0906-a6bb3d802-25aca117e304cae3-ihdb"
> >  
> > \
> > >   op start interval="0" timeout="120s" \
> > >   op monitor interval="30s" timeout="30s" \
> > >   op stop interval="0" timeout="120s" \
> > >   meta target-role="Started"
> > > primitive psql lsb:postgresql-8.4 \
> > >   meta target-role="Started"
> > > group psql-group iscsi fs0 dbip bacula-fd psql \
> > >   meta target-role="Started"
> > > property $id="cib-bootstrap-options" \
> > >   dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> > >   cluster-infrastructure="Heartbeat" \
> > >   stonith-enabled="false" \
> > >   last-lrm-refresh="1291165836" \
> > >   no-quorum-policy="ignore"
> > > rsc_defaults $id="rsc-options" \
> > >   resource-stickiness="100"
> > > [ -------------- end config --------------]
> > > 
> > > This morning the postgres server started logging errors because of 
> > corrupted data files.
> > > 
> > > I stopped all of the services except for the iscsi one and manually 
> > > mounted 
> > the filesystem.  The system said something like "Warning: mounting a 
> > filesystem with errors".  Sorry I don't have the exact messages.
> > > 
> > > I unmounted the filesystem, did a fsck manually then restarted the 
> > services.  
> > > 
> > > Is there any way to have heartbeat fsck the filesystem like a normal 
> > > mount 
> > from fstab would?  Did I miss a step?
> > 
> > No. ext3 is a filesystem with a journal, so it is considered
> > that it can recover without fsck. Otherwise, there's a parameter
> > called run_fsck, check the meta data: crm ra info Filesystem.
> > 
> > BTW, it is very unusual (and suspicious) that the filesystem
> > starts having errors just like that, while the system's running.
> > You should find what caused the corruption.
> 
> On HP-UX with Serviceguard and VxFS (Journaled Filesystem) the filesystem is 
> checked every time before it it mounted: If it's clean nothing is done; if 
> not, either the journal is replayed or a full structural consistency check is 
> run (if a sever corroption was detected).
> Remember: A node could go down also because of a memory failure (which might 
> corrupt the filesystem)

Right.

> So I think checking a filesystem before mount is a good thing.

Setting the run_fsck parameter to "force" would enforce running
fsck. The problem is that that may take a very long time.
Unfortunately, we don't have a way to tell the cluster that we
need extra time to start the resource, so it may happen that
fsck starts and then times out on all nodes. Depending on
cluster configuration it may try to start, and then timeout,
quite a few times.

Thanks,

Dejan

> Regards,
> Ulrich
> 
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to