> On Oct 22, 2017, at 4:35 PM, Joseph L. Casale <[email protected]>
> wrote:
>
> -----Original Message-----
> From: CentOS [mailto:[email protected]] On Behalf Of Noam
> Bernstein
> Sent: Sunday, October 22, 2017 8:54 AM
> To: CentOS mailing list <[email protected]>
> Subject: [CentOS] Areca RAID controller on latest CentOS 7 (1708 i.e. RHEL
> 7.4) kernel 3.10.0-693.2.2.el7.x86_64
>
>> Is anyone running any Areca RAID controllers with the latest CentOS 7 kernel,
>> 3.10.0-693.2.2.el7.x86_64? We recently updated (from 3.10.0-
>> 514.26.2.el7.x86_64), and we’ve started having lots of problems. To add to
>> the confusion, there’s also a hardware problem (either with the controller or
>> the backplane most likely) that we’re in the process of analyzing.
>> Regardless,
>> we have an ARC1883i, and with the older kernel the system is stable, but
>> with the new kernel it locks up within 1-12 hours of boot, with errors in
>> /var/log/messages that start with things like
>> kernel: arcmsr0: abort device command of scsi id = 0 lun = 0
>> (that is indeed the RAID scsi device) and within a few minutes of those also
>> things like
>> Oct 19 23:06:57 radon kernel: INFO: task xfsaild/dm-9:913 blocked for more
>> than 120 seconds.
>
> You mention you have hardware problems, what are they?
They’re weird is what they are. There’s one slot that’s apparently bad. It
was first showing a failed disk (in the web interface, e.g.), but the disk is
apparently fine (as checked by putting in other known good disks into that
slot, and putting that disk into other slots or into a different machine), and
is currently listed as a hot spare, so it’s not actually being accessed. Now
that slot has apparently spontaneously fixed itself, in so far as it is showing
as a working disk. However, the lights that flash as it scans through the slots
on boot clearly behave differently for that slot than all the others (~1 s red
flash in the second scan, instead of more like 0.25 s) , so I don’t believe
that it’s really fixed. But so far as a I can tell when that slot is empty the
array behaves normally, except for these errors with the new kernel only.
> A write is blocked
> for longer than they host is willing to wait. There are a few sysctl
> parameters
> that affect this but I'd be more willing to suggest its related to your
> hardware
> problems.
As I said, these errors only show up with the latest kernel, so while I agree
in principle that it makes sense for it to be related to the hardware problem,
it has to be interacting with the kernel somehow as well.
Noam
____________
||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628 F +1 202 404 7546
https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
_______________________________________________
CentOS mailing list
[email protected]
https://lists.centos.org/mailman/listinfo/centos