Adding more nodes is best if you have unlimited budget :)You should add more
osds per node until you start hitting cpu or network bottlenecks. Use a perf
tool like atop/sysstat to know when this happens.
-------- Original message --------
From: kevin parrikar <[email protected]>
Date: 07/01/2017 19:56 (GMT+02:00)
To: Lionel Bouton <[email protected]>
Cc: [email protected]
Subject: Re: [ceph-users] Analysing ceph performance with SSD journal, 10gbe
NIC and 2 replicas -Hammer release
Wow thats a lot of good information. I wish i knew about all these before
investing on all these devices.Since i dont have any other option,will get
better SSD and faster HDD .
I have one more generic question about Ceph.
To increase the throughput of a cluster what is the standard practice is it
more osd "per" node or more osd "nodes".
Thanks alot for all your help.Learned so many new things thanks again
Kevin
On Sat, Jan 7, 2017 at 7:33 PM, Lionel Bouton <[email protected]>
wrote:
Le 07/01/2017 à 14:11, kevin parrikar a
écrit :
Thanks for your valuable input.
We were using these SSD in our NAS box(synology) and it was
giving 13k iops for our fileserver in raid1.We had a few spare
disks which we added to our ceph nodes hoping that it will give
good performance same as that of NAS box.(i am not comparing NAS
with ceph ,just the reason why we decided to use these SSD)
We dont have S3520 or S3610 at
the moment but can order one of these to see how it performs
in ceph .We have 4xS3500 80Gb handy.
If i create a 2 node cluster with 2xS3500 each and with
replica of 2,do you think it can deliver 24MB/s of 4k writes .
Probably not. See
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
According to the page above the DC S3500 reaches 39MB/s. Its
capacity isn't specified, yours are 80GB only which is the lowest
capacity I'm aware of and for all DC models I know of the speed goes
down with the capacity so you probably will get lower than that.
If you put both data and journal on the same device you cut your
bandwidth in half : so this would give you an average <20MB/s per
OSD (with occasional peaks above that if you don't have a sustained
20MB/s). With 4 OSDs and size=2, your total write bandwidth is
<40MB/s. For a single stream of data you will only get <20MB/s
though (you won't benefit from parallel writes to the 4 OSDs and
will only write on 2 at a time).
Not that by comparison the 250GB 840 EVO only reaches 1.9MB/s.
But even if you reach the 40MB/s, these models are not designed for
heavy writes, you will probably kill them long before their warranty
is expired (IIRC these are rated for ~24GB writes per day over the
warranty period). In your configuration you only have to write 24G
each day (as you have 4 of them, write both to data and journal and
size=2) to be in this situation (this is an average of only 0.28
MB/s compared to your 24 MB/s target).
We bought S3500
because last time when we tried ceph, people were suggesting
this model :) :)
The 3500 series might be enough with the higher capacities in some
rare cases but the 80GB model is almost useless.
You have to do the math considering :
- how much you will write to the cluster (guess high if you have to
guess),
- if you will use the SSD for both journals and data (which means
writing twice on them),
- your replication level (which means you will write multiple times
the same data),
- when you expect to replace the hardware,
- the amount of writes per day they support under warranty (if the
manufacturer doesn't present this number prominently they probably
are trying to sell you a fast car headed for a brick wall)
If your hardware can't handle the amount of write you expect to put
in it then you are screwed. There were reports of new Ceph users not
aware of this and using cheap SSDs that failed in a matter of months
all at the same time. You definitely don't want to be in their
position.
In fact as problems happen (hardware failure leading to cluster
storage rebalancing for example) you should probably get a system
able to handle 10x the amount of writes you expect it to handle and
then monitor the SSD SMART attributes to be alerted long before they
die and replace them before problems happen. You definitely want a
controller allowing access to this information. If you can't you
will have to monitor the writes and guess this value which is risky
as write amplification inside SSDs is not easy to guess...
Lionel
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com