Re: Fwd: Re: [GENERAL] SSDD reliability

Toby Corkindale Wed, 18 May 2011 18:11:22 -0700

On 19/05/11 10:50, mark wrote:

Note 1:
I have seen an array that was powered on continuously for about six
years, which killed half the disks when it was finally powered down,
left to cool for a few hours, then started up again.



Recently we rebooted about 6 machines that had uptimes of 950+ days.
Last time fsck had run on the file systems was 2006.

When stuff gets that old, has been on-line and under heavy load all that
time you actually get paranoid about reboots. In my newly reaffirmed
opinion, at that stage reboots are at best a crap shoot. We lost several
hours to that gamble more than we had budgeted for. HP is getting more of
their gear back than in a usual month.

I worked at one place, years ago, which had an odd policy.. They hadautomated hard resets hit all their servers on a Friday night, every week.

I thought they were mad at the time!

But.. it does mean that people design and test the systems so that theycan survive unattended resets reliably. (No one wants to get a supportcall at 11pm on Friday because their server didn't come back up.)

It still seems a bit messed up though - even if friday night is alow-use period, it still means causing a small amount of disruption tocustomers - especially if a developer or sysadmin messed up, and aserver *doesn't* come back up.


--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Fwd: Re: [GENERAL] SSDD reliability

Reply via email to