On Mon, Aug 21, 2017 at 1:58 PM, Christian Balzer <[email protected]> wrote:

> On Mon, 21 Aug 2017 13:40:29 +0800 Nick Tan wrote:
>
> > Hi all,
> >
> > I'm in the process of building a ceph cluster, primarily to use cephFS.
> At
> > this stage I'm in the planning phase and doing a lot of reading on best
> > practices for building the cluster, however there's one question that I
> > haven't been able to find an answer to.
> >
> > Is it better to use many hosts with single OSD's, or fewer hosts with
> > multiple OSD's?  I'm looking at using 8 or 10TB HDD's as OSD's and hosts
> > with up to 12 HDD's.  If a host dies, that means up to 120TB of data will
> > need to be recovered if the host has 12 x 10TB HDD's.  But if smaller
> hosts
> > with single HDD's are used then a single host failure will result in
> only a
> > maximum of 10TB to be recovered, so in this case it looks better to use
> > smaller hosts with single OSD's if the failure domain is the host.
> >
> > Are there other benefits or drawbacks of using many small servers with
> > single OSD's vs fewer large servers with lots of OSD's?
> >
>
> Ideally you'll have smallish hosts with smallish disk (not 10TB monsters),
> both to reduce the impact an OSD or host loss would have as well as
> improving IOPS (more spindles).
>
> With larger hosts you'll also want to make sure that a single host failure
> is not going to create a "full" (and thus unusable) cluster, besides the
> I/O strain recovery will cause.
> 5 or 10 hosts are a common, typical starting point.
>
> Also important to remember is the configuration parameter
> "mon_osd_down_out_subtree_limit = host"
> since repairing a large host is likely to be faster that replicating all
> the data it held.
>
> Of course "ideally" tends to mean "expensive" in most cases and this is
> no exception.
>
> Smaller hosts are more expensive in terms of space and parts (a NIC for
> each OSD instead of one per 12, etc).
> And before you mention really small hosts with 1GbE NICs, the latency
> penalty is significant there, the limitation to 100MB/s is more of an
> issue with reads than writes.
>
> Penultimately you need to balance your budget, rack space and needs.
>


Thanks Christian.  The tip about the "mon_osd_down_out_subtree_limit =
host" setting is very useful.  If we go down the path of large servers (12+
disks), my intention is to have a spare empty chassis so in the case of a
server failure I could move the disks into the spare chassis and bring it
back online which would be done much faster than trying to recover 12
OSD's.  That was my main concern with the large servers which this helps
alleviate.  Thanks!
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to