James Manning writes:
> What worries me is that what looks like is happening is that the
> md-layer is passing a very-invalid sector request (for whatever reason
> it got that far) down to the devices making up your raid1 and since the
> ll_rw_blk::make_request() fails the md-layer tags that as a failing device
> (without having checked the request against valid size itself) and moves
> on, failing on successive devices (same reason, esp. in raid1 :) until
> it gives up and just reschedules the block all over again, eventually just
> failing altogether.
The faulty blocks were tried several times during that night (it seems
that one file of the squid cache and one mailbox got lost), but until
now nothing else happened - well, the backup works, but I can't read
the kernel messages anymore, because the klogd refuses to work (after
a restart):
Jan 10 11:06:42 picard kernel: klogd 1.3-3, log source = /proc/kmsg started.
Jan 10 11:06:42 picard kernel: Cannot find map file.
Jan 10 11:06:42 picard kernel: Error seeking in /dev/kmem
Jan 10 11:06:42 picard kernel: Error adding kernel module table entry.
Is it just bad luck or may those two problems have the same reason?
> Is this a correct interpretation? If so, it seems like either struct
> mddev_s or struct mirror_info needs a size/sect_count/whatever parameter
> added to check against the buffer_head being requested... I don't see
> a make_request path back that can handle this case on its own...
So it's a bug in the raid code or is the raid-device corrupted? What
might happen if I reboot the whole machine? It's a production server,
so I'm quite nervous...
The other thing I changed on this server, was to integrate a Tekram
DC395UW controller only for the QIC, but this should have no effect,
should it?
Thanks,
Jochen
--
# mgm ComputerSysteme und -Service GmbH
# Sophienstr. 26 / 70178 Stuttgart / Germany / Voice: +49.711.96683-5
The Internet treats censorship as a malfunction and routes around it.
--John Perry Barlow