Re: [CentOS] Software raid Oddity

2020-12-16 Thread Christopher Wensink
I had an issue similar to this years ago where I helped out a former 
employer on a Dell Poweredge System with a RAID 5 array (Windows). The 
system Refused to Boot, but there were lights on the front of the 
backplane were the drives slid in, indicating drive fault (amber) or 
drive ok (green).  One of the tests I did was I re-arranged the drives 
where they were inserted into the backplane. When I did that the same 
lights (slots) went amber after re-arranging the drives.


The problem wasn't the drives at all, the problem was the controller 
card going bad.  The IT guy that was there full time ended up shipping 
the drives off to a recovery service depo, and they recovered the data 
there, no problem.


When I worked for Sage we had SCSI RAID Controller cards that had 
similar functions, where the RAID card config was backed up in the 
drives, and the Drive configuration was stored in the RAID controller, 
so they backed up the config of each other.


In the event of a failure of the controller card, the same model card 
could be put back into the system, and the config data pulled off the 
recovery location in the drives, then the system was back up and going 
again.


Perhaps that is what's happening to your system.

I would take several full bare metal backups right now (and test restore 
the data onto a new system) there may be looming hardware failure around 
the corner.


Chris

On 12/16/2020 3:10 PM, Frank Cox wrote:

On Wed, 16 Dec 2020 13:57:13 -0700
Paul R. Ganci via CentOS wrote:


My gut suggests that the raid array was never degraded and that my
system (i.e. cat /proc/mdstat) was lying to me. Any Opinions?

I wonder if it's a ram failure in either the main computer or the drive 
controller.  An intermittent ram failure (or cold solder joint or something 
equally hard to track down) could cause all manner of un-repeatable weirdness.



--
Christopher Wensink
IS Administrator
Five Star Plastics, Inc
1339 Continental Drive
Eau Claire, WI 54701
Office:  715-831-1682
Mobile:  715-563-3112
Fax:  715-831-6075
cwens...@five-star-plastics.com
www.five-star-plastics.com


___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Software raid Oddity

2020-12-16 Thread Frank Cox
On Wed, 16 Dec 2020 13:57:13 -0700
Paul R. Ganci via CentOS wrote:

> My gut suggests that the raid array was never degraded and that my 
> system (i.e. cat /proc/mdstat) was lying to me. Any Opinions?

I wonder if it's a ram failure in either the main computer or the drive 
controller.  An intermittent ram failure (or cold solder joint or something 
equally hard to track down) could cause all manner of un-repeatable weirdness.

-- 
MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Software raid Oddity

2020-12-16 Thread Paul R. Ganci via CentOS
I have a CentOS 7.9 system with a software raid 6 root partition. Today 
something very strange occurred. At 6:45AM the system crashed. I 
rebooted and when the system came up I had multiple emails indicating 
that 3 out of 6 drives had failed on the root partition. Strangely I was 
able to boot into the system and everything was working correctly despite


> cat /proc/mdstat

also indicating 3 out of 6 drives had failed. Since the system was up 
and running despite the fact more than 2 drives had failed in the root 
raid array I decided to reboot the system. Actually I shut it down, 
waited for the drives to spin down and then restarted. This time when it 
came back the 3 missing drives were back in the array and a cat 
/proc/mdstat indicated all 6 drives were again in the raid 6 array. So a 
few questions:


1.) If 3 our of 6 drives of a raid 6 array supposedly fail, how does the 
array still function?

2.) Why would a shutdown/restart sequence supposedly fix the array?
3.) My gut suggests that the raid array was never degraded and that my 
system (i.e. cat /proc/mdstat) was lying to me. Any Opinions?


Has anybody else ever seen such strange behavior?
--
Paul (ga...@nurdog.com)
Cell: (303)257-5208
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos