Oh, I forgot to mention, these drives have been in service for about 9
months.
If it's useful / interesting at all, here is the smartctl -a output from
one of the 840's I installed about the same time as the ones that failed
recently, but it has not yet failed:
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.0-33-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: Samsung SSD 840 PRO Series
Serial Number: S1ANNSAF800928M
LU WWN Device Id: 5 002538 5a028ebe1
Firmware Version: DXM06B0Q
User Capacity: 128,035,676,160 bytes [128 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Fri Sep 4 19:18:22 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (65476) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 15) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always
- 0
9 Power_On_Hours 0x0032 098 098 000 Old_age Always
- 6768
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always
- 6
177 Wear_Leveling_Count 0x0013 037 037 000 Pre-fail Always
- 2275
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always
- 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always
- 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always
- 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always
- 0
187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always
- 0
190 Airflow_Temperature_Cel 0x0032 072 064 000 Old_age Always
- 28
195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always
- 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always
- 0
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always
- 2
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always
- 68358879670
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
255 0 65535 Read_scanning was never started
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
If I'm reading the total LBA right, and this whitepaper is correct (
http://www.samsung.com/global/business/semiconductor/minisite/SSD/global/html/whitepaper/whitepaper07.html)
that correlates to about 32.5TB written to the drive. The last time I
checked all the drives in this cluster they were about evenly worn.
Assuming that's right, and the wear has been constant, we should have at
least another nine months from these drives based on the information at (
http://www.samsung.com/us/pdf/memory-storage/840PRO_25_SATA_III_Spec.pdf)
which is still less than I'd calculated when I bought these drives. Though
I assumed a much slower wear rate (1TB / month) than what we're actually
apparently getting (3.6TB / month), so my original estimated lifespan of
about 6 years was way off.
QH
On Fri, Sep 4, 2015 at 1:15 PM, James (Fei) Liu-SSI <
[email protected]> wrote:
> Hi Andrija,
>
> Thanks for your promptly response. Would be possible to have any change to
> know your hardware configuration including your server information?
> Secondly, Is there anyway to duplicate your workload with fio-rbd, rbd
> bench or rados bench?
>
>
>
> “so 2 SSDs in 3 servers vanished in...2-3 weeks, after a 3-4 months of
> being in production (VMs/KVM/CloudStack)”
>
>
>
> What you mean over here is that you deploy Ceph with CloudStack , am I
> correct? The 2 SSDs vanished in 2~3 weeks is brand new Samsung 850 Pro
> 128GB, right?
>
>
>
> Thanks,
>
> James
>
>
>
> *From:* Andrija Panic [mailto:[email protected]]
> *Sent:* Friday, September 04, 2015 11:53 AM
> *To:* James (Fei) Liu-SSI
> *Cc:* Quentin Hartman; ceph-users
>
> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T vs.
> Intel s3700
>
>
>
> Hi James,
>
>
>
> I had 3 CEPH nodes as folowing: 12 OSDs(HDD) and 2 SSDs (2x 6 Journals
> partitions on each SSD) - SSDs just vanished with no warning, no smartctl
> errors nothing... so 2 SSDs in 3 servers vanished in...2-3 weeks, after a
> 3-4 months of being in production (VMs/KVM/CloudStack)
>
> Mine were also Samsung 850 PRO 128GB.
>
>
>
> Best,
>
> Andrija
>
>
>
> On 4 September 2015 at 19:27, James (Fei) Liu-SSI <
> [email protected]> wrote:
>
> Hi Quentin and Andrija,
>
> Thanks so much for reporting the problems with Samsung.
>
>
>
> Would be possible to get to know your configuration of your system? What
> kind of workload are you running? Do you use Samsung SSD as separate
> journaling disk, right?
>
>
>
> Thanks so much.
>
>
>
> James
>
>
>
> *From:* ceph-users [mailto:[email protected]] *On Behalf
> Of *Quentin Hartman
> *Sent:* Thursday, September 03, 2015 1:06 PM
> *To:* Andrija Panic
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] which SSD / experiences with Samsung 843T vs.
> Intel s3700
>
>
>
> Yeah, we've ordered some S3700's to replace them already. Should be here
> early next week. Hopefully they arrive before we have multiple nodes die at
> once and can no longer rebalance successfully.
>
>
>
> Most of the drives I have are the 850 Pro 128GB (specifically
> MZ7KE128HMGA)
>
> There are a couple 120GB 850 EVOs in there too, but ironically, none of
> them have pooped out yet.
>
>
>
> On Thu, Sep 3, 2015 at 1:58 PM, Andrija Panic <[email protected]>
> wrote:
>
> I really advise removing the bastards becore they die...no rebalancing
> hapening just temp osd down while replacing journals...
>
> What size and model are yours Samsungs?
>
> On Sep 3, 2015 7:10 PM, "Quentin Hartman" <[email protected]>
> wrote:
>
> We also just started having our 850 Pros die one after the other after
> about 9 months of service. 3 down, 11 to go... No warning at all, the drive
> is fine, and then it's not even visible to the machine. According to the
> stats in hdparm and the calcs I did they should have had years of life
> left, so it seems that ceph journals definitely do something they do not
> like, which is not reflected in their stats.
>
>
>
> QH
>
>
>
> On Wed, Aug 26, 2015 at 7:15 AM, 10 minus <[email protected]> wrote:
>
> Hi ,
>
> We got a good deal on 843T and we are using it in our Openstack setup ..as
> journals .
> They have been running for last six months ... No issues .
>
> When we compared with Intel SSDs I think it was 3700 they were shade
> slower for our workload and considerably cheaper.
>
> We did not run any synthetic benchmark since we had a specific use case.
>
> The performance was better than our old setup so it was good enough.
>
> hth
>
>
>
> On Tue, Aug 25, 2015 at 12:07 PM, Andrija Panic <[email protected]>
> wrote:
>
> We have some 850 pro 256gb ssds if anyone interested to buy:)
>
> And also there was new 850 pro firmware that broke peoples disk which was
> revoked later etc... I'm sticking with only vacuum cleaners from Samsung
> for now, maybe... :)
>
> On Aug 25, 2015 12:02 PM, "Voloshanenko Igor" <[email protected]>
> wrote:
>
> To be honest, Samsung 850 PRO not 24/7 series... it's something about
> desktop+ series, but anyway - results from this drives - very very bad in
> any scenario acceptable by real life...
>
>
>
> Possible 845 PRO more better, but we don't want to experiment anymore...
> So we choose S3500 240G. Yes, it's cheaper than S3700 (about 2x times), and
> no so durable for writes, but we think more better to replace 1 ssd per 1
> year than to pay double price now.
>
>
>
> 2015-08-25 12:59 GMT+03:00 Andrija Panic <[email protected]>:
>
> And should I mention that in another CEPH installation we had samsung 850
> pro 128GB and all of 6 ssds died in 2 month period - simply disappear from
> the system, so not wear out...
>
> Never again we buy Samsung :)
>
> On Aug 25, 2015 11:57 AM, "Andrija Panic" <[email protected]> wrote:
>
> First read please:
>
> http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
>
> We are getting 200 IOPS in comparison to Intels3500 18.000 iops - those
> are constant performance numbers, meaning avoiding drives cache and
> running for longer period of time...
> Also if checking with FIO you will get better latencies on intel s3500
> (model tested in our case) along with 20X better IOPS results...
>
> We observed original issue by having high speed at begining of i.e. file
> transfer inside VM, which than halts to zero... We moved journals back to
> HDDs and performans was acceptable...no we are upgrading to intel S3500...
>
> Best
>
> any details on that ?
>
> On Tue, 25 Aug 2015 11:42:47 +0200, Andrija Panic
> <[email protected]> wrote:
>
> > Make sure you test what ever you decide. We just learned this the hard
> way
> > with samsung 850 pro, which is total crap, more than you could imagine...
> >
> > Andrija
> > On Aug 25, 2015 11:25 AM, "Jan Schermer" <[email protected]> wrote:
> >
> > > I would recommend Samsung 845 DC PRO (not EVO, not just PRO).
> > > Very cheap, better than Intel 3610 for sure (and I think it beats even
> > > 3700).
> > >
> > > Jan
> > >
> > > > On 25 Aug 2015, at 11:23, Christopher Kunz <[email protected]>
> > > wrote:
> > > >
> > > > Am 25.08.15 um 11:18 schrieb Götz Reinicke - IT Koordinator:
> > > >> Hi,
> > > >>
> > > >> most of the times I do get the recommendation from resellers to go
> with
> > > >> the intel s3700 for the journalling.
> > > >>
> > > > Check out the Intel s3610. 3 drive writes per day for 5 years. Plus,
> it
> > > > is cheaper than S3700.
> > > >
> > > > Regards,
> > > >
> > > > --ck
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > [email protected]
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > [email protected]
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>
>
>
> --
> Mariusz Gronczewski, Administrator
>
> Efigence S. A.
> ul. Wołoska 9a, 02-583 Warszawa
> T: [+48] 22 380 13 13
> F: [+48] 22 380 13 14
> E: [email protected]
> <mailto:[email protected]>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
>
>
> --
>
>
>
> Andrija Panić
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com