Here are my logs showing my partition trying to unmount itself. Are the 5 digit numbers at the end of some of the lines PIDs or are those inode numbers? It shows that /d0 unmounted but is there any way to tell if it unmounted cleanly?
Nov 5 17:00:54 mail2 Filesystem[21320]: [21362]: ERROR: Couldn't unmount /d0; trying cleanup with SIGTERM Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) /d0: Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) Nov 5 17:00:54 mail2 last message repeated 15 times Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stdout) 12046 Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) c Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) m Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stdout) 12589 Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) c Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) m Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stdout) 12601 Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) c Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stdout) 13332 Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) c Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) m ---SNIP Nov 5 17:00:54 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stdout) 31347 Nov 5 17:00:55 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) c Nov 5 17:00:55 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) m Nov 5 17:00:55 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) Nov 5 17:00:55 mail2 Filesystem[21320]: [21364]: INFO: Some processes on /d0 were signalled Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) umount: /d0: device is busy Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) umount: /d0: device is busy Nov 5 17:00:56 mail2 Filesystem[21320]: [21373]: ERROR: Couldn't unmount /d0; trying cleanup with SIGTERM Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) /d0:Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) Nov 5 17:00:56 mail2 last message repeated 15 times Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stdout) 21367 Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) c Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) m Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stdout) 21371 Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) c Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) m Nov 5 17:00:56 mail2 lrmd: [4876]: info: RA output: (masterFS:stop:stderr) Nov 5 17:00:56 mail2 Filesystem[21320]: [21375]: INFO: Some processes on /d0 were signalled Nov 5 17:00:58 mail2 Filesystem[21320]: [21379]: INFO: unmounted /d0 successfully On 11/15/10 10:44 AM, "Syn, Joonho" <[email protected]> wrote: >I repeated my testing process a couple times before doing this. My test >partition didn't have much on it though, just a few simple text files >versus the millions of files on a maildir partition for 100+ users. >Perhaps my test partition wasn't "full" enough? > >On 11/13/10 10:01 AM, "Dimitri Maziuk" <[email protected]> wrote: > >>On 11/12/2010 7:43 PM, Syn, Joonho wrote: >> >>I think you have to >>> -remove the journal of the ext3 partition "tune2fs O ^has_journal [my >>>device]" >>- fsck at this point >>> -delete and recreate the partition using fdisk >>- resize2fs at this point >>> -check the newly expanded partition for errors "fsck n [my device] >>> >>> At this point the fsck returned a "bad superblock error". I tested >>using a similar setup but without heartbeat and did not get any >>corruption. Any ideas as to what led to my bad superblocks? >> >>I have not done this in a while, but my recollection is you're supposed >>to get bad superblocks when you run fsck and partition size doesn't >>match filesystem size. So the real question is why didn't you get them >>in your testing. >> >>Dima >>_______________________________________________ >>Linux-HA mailing list >>[email protected] >>http://lists.linux-ha.org/mailman/listinfo/linux-ha >>See also: http://linux-ha.org/ReportingProblems > >_______________________________________________ >Linux-HA mailing list >[email protected] >http://lists.linux-ha.org/mailman/listinfo/linux-ha >See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
