Hi,

again, as I said, in normal operation everything is fine with SMR. They perform well in particular for large sequential writes because of the on platter cache (20 GB I think). All tests we have done were with good SSDs for OSD cache.

Things blow up during backfill / recovery because the SMR disks saturate and then slow down to some 2-3 iops.

Cache tiering will not work either because if one or more of the SMR in the backend fail, backfilling / recovery will again slow things down to almost no client io.

And yes, we tried all sorts of throttling mechanisms during backfilling / recovery.

In all cases we tested the cluster is useless from the client side during backfilling / recovery.

- mike

On 2/19/17 9:54 AM, Wido den Hollander wrote:

Op 18 februari 2017 om 17:03 schreef rick stehno <rs3...@me.com>:


I work for Seagate and have done over a hundred of tests using SMR 8TB disks in 
a cluster. It all depends on what your access is if SMR hdd would be the best 
choice. Remember SMR hdd don't perform well doing random writes, but are 
excellent for reads and sequential writes.
I have many tests where I added a SSD or PCIe flash card to place the journals 
on these devices and SMR performed better than a typical CMR disk and overall 
cheaper than using all CMR hdd. You can also use some type of caching like Ceph 
Cache Tier or other caching with very good results.
By placing the journals on flash or adopt some type of caching you are 
eliminating the double writes to the SMR hdd and performance should be fine. I 
have test results if you would like to see them.

I am really keen on seeing those numbers. The blogpost ( 
https://blog.widodh.nl/2017/02/do-not-use-smr-disks-with-ceph/ ) I wrote is 
based on two occasions where people bought 6TB and 8TB Seagate SMR disks and 
used them in Ceph.

One use-case was with a application which would write natively to RADOS and the 
other with CephFS.

In both occasions the Journals were on SSD, but the backing disk would just be 
saturated very easily. Ceph still does Random Writes on the disk for updating 
things like PGLogs and such, writing new OSDMaps, etc.

A sequential large write into Ceph might be splitted up by either CephFS or RBD 
into smaller writes to various RADOS objects.

I haven't seen a use-case where SMR disks perform 'OK' at all with Ceph. That's 
why my advise is still to stay away from those disks for Ceph.

In both cases my customers had to spent a lot of money on buying new disks to 
make it work. The first case was actually somebody who bought 1000 SMR disks 
and then found out they didn't work with Ceph.

Wido


Rick
Sent from my iPhone, please excuse any typing errors.

On Feb 17, 2017, at 8:49 PM, Mike Miller <millermike...@gmail.com> wrote:

Hi,

don't go there, we tried this with SMR drives, which will slow down to 
somewhere around 2-3 IOPS during backfilling/recovery and that renders the 
cluster useless for client IO. Things might change in the future, but for now, 
I would strongly recommend against SMR.

Go for normal SATA drives with only slightly higher price/capacity ratios.

- mike

On 2/3/17 2:46 PM, Stillwell, Bryan J wrote:
On 2/3/17, 3:23 AM, "ceph-users on behalf of Wido den Hollander"
<ceph-users-boun...@lists.ceph.com on behalf of w...@42on.com> wrote:

Op 3 februari 2017 om 11:03 schreef Maxime Guyot
<maxime.gu...@elits.com>:


Hi,

Interesting feedback!

  > In my opinion the SMR can be used exclusively for the RGW.
  > Unless it's something like a backup/archive cluster or pool with
little to none concurrent R/W access, you're likely to run out of IOPS
(again) long before filling these monsters up.

That¹s exactly the use case I am considering those archive HDDs for:
something like AWS Glacier, a form of offsite backup probably via
radosgw. The classic Seagate enterprise class HDD provide ³too much²
performance for this use case, I could live with 1Ž4 of the performance
for that price point.


If you go down that route I suggest that you make a mixed cluster for RGW.

A (small) set of OSDs running on top of proper SSDs, eg Samsung SM863 or
PM863 or a Intel DC series.

All pools by default should go to those OSDs.

Only the RGW buckets data pool should go to the big SMR drives. However,
again, expect very, very low performance of those disks.
One of the other concerns you should think about is recovery time when one
of these drives fail.  The more OSDs you have, the less this becomes an
issue, but on a small cluster is might take over a day to fully recover
from an OSD failure.  Which is a decent amount of time to have degraded
PGs.
Bryan
E-MAIL CONFIDENTIALITY NOTICE:
The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient, you are notified that any use, dissemination, 
distribution, copying, or storage of this message or any attachment is strictly 
prohibited.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to