> On Apr 10, 2017, at 4:30 PM, Machine Man <gearbo...@outlook.com> wrote:
> 
> Do you select drives based on DWPD?

Not really. Inside a given product line, the difference in DWPD is a matter of 
overprovisioning.
You can adjust the overprovisioning yourself, if needed.

note to the lurkers, overprovisioning also impacts the write performance of 
garbage collection

> I am struggling to $500 - $700 drives in stock. I am limited to a number of 
> distributors and pretty much unless its HP, Cisco or Dell its not kept in 
> stock. On a number of disks options I got a ship date of late June and all 3 
> distributors  indicating SSD drives are constrained. 

Yes, there is a global shortage and all major vendors are on allocation.

> I am now down to adding a single SSD during busy hours or when the alerts 
> start rolling in and removing the ZIL afterhours or when the load reduces 
> again.
> 
> My only other options for the next 3 weeks are:
> 1 - add 15K drives for ZIL and see if that helps.
> 2 - Hope for the best on the single old OCS Talos 2

I have bad luck with these

> 3 - Mix SAS/SATA on the same backplane.

No guarantees, but for more modern expanders and HBAs, we see fewer problems 
mixing.
I wouldn’t attempt for 3G SAS/SATA, but 12G seems more robust.
 — richard


> 
> I was 100% banking on the ZeusRAM since that is what I could get my hands 
> immediately.
> From: Richard Elling <richard.ell...@richardelling.com>
> Sent: Monday, April 10, 2017 5:49:55 PM
> To: Machine Man
> Cc: omnios-discuss@lists.omniti.com
> Subject: Re: [OmniOS-discuss] ZeusRAM - predictive failure
>  
> 
>> On Apr 10, 2017, at 2:39 PM, Machine Man <gearbo...@outlook.com 
>> <mailto:gearbo...@outlook.com>> wrote:
>> 
>> Thank you. I am sending it back to where we purchased it from. I thought 
>> these were no longer avail, but the distributor still listed them and had in 
>> stock.
>> I was hesitant to purchase, but I am in desperate need for a ZIL. 
> 
> ZeusRAMs have been EOL for a year or more. AIUI, the parts are no longer 
> available to build them.
> We do see better performance from the modern, enterprise-class, 12G SAS parts 
> from HGST and Toshiba.
> Unfortunately, they are priced by $/GB and not $/latency, so the smaller 
> capacity (GB) drives are also slower.
>  — richard
> 
>> 
>> 
>> From: Richard Elling <richard.ell...@richardelling.com 
>> <mailto:richard.ell...@richardelling.com>>
>> Sent: Monday, April 10, 2017 4:15:32 PM
>> To: Machine Man
>> Cc: omnios-discuss@lists.omniti.com <mailto:omnios-discuss@lists.omniti.com>
>> Subject: Re: [OmniOS-discuss] ZeusRAM - predictive failure
>>  
>> 
>>> On Apr 10, 2017, at 1:00 PM, Machine Man <gearbo...@outlook.com 
>>> <mailto:gearbo...@outlook.com>> wrote:
>>> 
>>> Today I received one of the ZeusRAM that I ordered, both brand new. I was 
>>> struggling to find SAS SSD drives that were available in my price range as 
>>> I desperately need to add a ZIL. 
>>> I decided to order ZeusRAM since they had one in stock and figured I'll add 
>>> it while waiting for the other one as they are really should not be prone 
>>> to failure based on design. I have not used them and would normally just 
>>> prefer to use regular SSD drives.
>>> 
>>> Slotted ZeusRAM in and it began to rapidly blink the same as the disks that 
>>> are currently in the pool on that backplain. Running the command format 
>>> would never return with a list of disks. I left it for about 15 min and 
>>> pulled it since it says on the disk that it can take up to 10 min for the 
>>> caps. I could see there is an amber and green LED on the drive itself 
>>> blinking, even when removed.
>>> I slotted it back in and the disk was then available. After a few min the 
>>> fault light cam on and the disk was unavailable due to the following:
>>> 
>>> Fault class : fault.io.disk.predictive-failure
>> 
>> This occurs when the drive responds to an I/O and indicates a predictive 
>> failure or
>> the periodic query for drives sees a predicted failure. It is the drive 
>> telling the OS that
>> the drive thinks it will fail. There is nothing you can do on the OS to 
>> “fix” this.
>> 
>> It is possible that HGST (nee STEC) can help with further diagnosis using 
>> the vendor-specific
>> log pages. Several years ago, STEC helped us with root cause of failing 
>> ultracapacitor in a drive.
>> AFAIK, there is no publicly available decoder for those log pages.
>>  — richard
>> 
>> 
>>> Affects     : 
>>> dev:///:devid=id1,sd@n5000a720300b3d57//pci@0,0/pci8086,340e@7/pci1000,3040@0/iport@f0/disk@w5000a72a300b3d57,0
>>>  
>>> <dev:///:devid=id1,sd@n5000a720300b3d57//pci@0,0/pci8086,340e@7/pci1000,3040@0/iport@f0/disk@w5000a72a300b3d57,0>
>>>                   faulted and taken out of service
>>> FRU         : "Slot 09" 
>>> (hc://:product-id=LSI-SAS2X36:server-id=:chassis-id=50030480178cf57f:serial=STM000****:part=STEC-ZeusRAM:revision=C025/ses-enclosure=1/bay=8/disk=0
>>>  
>>> <hc://:product-id=LSI-SAS2X36:server-id=:chassis-id=50030480178cf57f:serial=STM000****:part=STEC-ZeusRAM:revision=C025/ses-enclosure=1/bay=8/disk=0>)
>>>                   faulty
>>> Description : SMART health-monitoring firmware reported that a disk
>>>               failure is imminent.
>>> 
>>> 
>>> I cleared the fault and the drive was then usable again for a few min same 
>>> thing happened. Eventually the amber light on the disk itself (not the 
>>> enclosure disk light) no longer blinked and the disks was online for quite 
>>> some time before the alert above reappeared.
>>> 
>>> 
>>> === START OF INFORMATION SECTION ===
>>> Vendor:               STEC
>>> Product:              ZeusRAM
>>> Revision:             C025
>>> Compliance:           SPC-4
>>> User Capacity:        8,000,000,000 bytes [8.00 GB]
>>> Logical block size:   512 bytes
>>> Rotation Rate:        Solid State Device
>>> Form Factor:          3.5 inches
>>> Logical Unit id:      0x5000a720300b3d57
>>> Serial number:        STM000******
>>> Device type:          disk
>>> Transport protocol:   SAS (SPL-3)
>>> Local Time is:        Mon Apr 10 19:17:23 2017 UTC
>>> SMART support is:     Available - device has SMART capability.
>>> SMART support is:     Enabled
>>> Temperature Warning:  Enabled
>>> === START OF READ SMART DATA SECTION ===
>>> SMART Health Status: OK
>>> Current Drive Temperature:     40 C
>>> Drive Trip Temperature:        80 C
>>> Elements in grown defect list: 0
>>> Vendor (Seagate) cache information
>>>   Blocks sent to initiator = 0
>>>   Blocks sent to initiator = 0
>>> Error counter log:
>>>            Errors Corrected by           Total   Correction     Gigabytes   
>>>  Total
>>>                ECC          rereads/    errors   algorithm      processed   
>>>  uncorrected
>>>            fast | delayed   rewrites  corrected  invocations   [10^9 bytes] 
>>>  errors
>>> read:          0        0         0         0          0         21.323     
>>>       0
>>> write:         0        0         0         0          0         83.809     
>>>       0
>>> Non-medium error count:        0
>>> 
>>> 
>>> 
>>> Is there anything special that should be done for ZeusRAM in sd.conf? Its a 
>>> node install and both nodes can see all the drives. I don't see any smart 
>>> errors listed, but running fmadm it will show the disk as faulty due to 
>>> predictive failure.
>>> OmniOS r20 all patches applied.
>>> 
>>> 
>>> thanks,  
>>> _______________________________________________
>>> OmniOS-discuss mailing list
>>> OmniOS-discuss@lists.omniti.com <mailto:OmniOS-discuss@lists.omniti.com>
>>> http://lists.omniti.com/mailman/listinfo/omnios-discuss 
>>> <http://lists.omniti.com/mailman/listinfo/omnios-discuss>
>> --
>> 
>> richard.ell...@richardelling.com <mailto:richard.ell...@richardelling.com>
>> +1-760-896-4422

--

richard.ell...@richardelling.com
+1-760-896-4422



_______________________________________________
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Reply via email to