oi,

I myself have been a victim of this too, so I thought I'd join in.

The problem seems somehow related to the 2.6.8 kernel, XFS and software
raid1. I've seen the corruption on three machines now (including Joost's)
and in all cases:

- the kernel was Debian's 2.6.8;
- the filesystem in question was XFS;
- software raid1 (mirroring) was used.

Even though lvm2 was used for /usr, /var, etc, corruption occurred on the
root filesystem too, which was on a plain md device. Typical disk layout:

/dev/md0 (raid1: /dev/hda1 /dev/hdc1)
        root filesystem, XFS (agcount=2,unwritten=1)
/dev/md1 (raid1: /dev/hda3 /dev/hdc3)
        lvm2 pv, contains:
                /usr, XFS (agcount=2,unwritten=1,logsize=128m)
                /var, XFS (agcount=2,unwritten=1,logsize=128m)
                /tmp, XFS (agcount=2,unwritten=1,logsize=128m)
                /home, XFS (agsize=4g,unwritten=1,logsize=128m)

In some cases only the root filesystem was corrupted, in some cases other
filesystems too. The corruption came to light under moderately heavy I/O
pressure (such as during an apt-get dist-upgrade).

XFS complained about corrupted in-memory structures in some of the cases.
However, it is very unlikely that all three machines have bad RAM, and
memtest86+ reports no problems.

My personal hunch is that it is some bad xfs â raid1 interaction but
debugging is really hard to do (given that it's impossible to even run
âdmesgâ).

thanks,

-- 
Wessel Dankers <[EMAIL PROTECTED]>

âbecause Bill Gates is a Jehovah's witness and so nothing can work on
St. Swithin's day.â

Attachment: signature.asc
Description: Digital signature

Reply via email to