We're looking to deploy CEPH on about 8 Dell servers to start, each of
which typically contain 6 to 8 harddisks with Perc RAID controllers which
support write-back cache (~512 MB usually). Most machines have between 32
and 128 GB RAM. Our questions are as follows. Please feel free to comment
on even just one of the questions below if that's the area of your
expertise/interest.
1. Based on various "best practice" guides, they suggest putting the OS
on a separate disk. But, we though that would not be good because we'd
sacrifice a whole disk on each machine (~3 TB) or even two whole disks (~6
TB) if we did a hardware RAID 1 on it. So, do people normally just
sacrifice one whole disk? Specifically, we came up with this idea:
1. We set up all hard disks as "pass-through" in the raid controller,
so that the RAID controller's cache is still in effect, but the OS sees
just a bunch of disks (6 to 8 in our case)
2. We then do a SOFTWARE-baised RAID 1 (using Centos 6.4) for the OS
across all 6 to 8 hardisks
3. We then do a SOFTWARE-baised RAID 0 (using Centos 6.4) for the
SWAP space.
4. *Does anyone see any flaws in our idea above? We think that RAID 1
is not computationally expensive for the machines to computer,
and most of
the time, the OS should be in RAM. Similarly, we think RAID 0 should be
easy for the CPU to compute, and hopefully, we won't hit much SWAP if we
have enough RAM. And this way, we don't sacrific 1 or 2 whole disks for
just the OS.*
2. Based on the performance benchmark blog of Marc Nelson (
http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/),
has anything substantially changed since then? Specifically, it suggests
that SSDs may not be really necessary if one has raid controllers with
write-back cache. Is this still true even though the article was written
with a version of CEPH that was over 1 year old? (Marc suggests that things
may change with newer versions of CEPH)
3. Based on our understanding, it would seem that CEPH can deliver very
high throughput performance (especially for reads) if dozens and dozeons of
hard disks are being accessed simultaneously across multiple machines. So,
we could have several GBs throughput, right? (CEPH never advertises the
advantage of read throughput with distributed architecture, so I'm
wondering if I'm missing something.) If so, then is it reasonable to assume
that one common bottleneck is the ethernet? So if we only use 1 NIC card at
1 GBs, that'll be a major bottleneck? If so, we're thinking of trying to
"bond" multiple 1 GB/s ethernet cards to make a "bonded" ethernet
connection of 4 GBs (4 * 1 GB/s). But we didn't see anyone discuss this
strategy? Is there any holes in it? Or does CEPH "automatically" take
advantage of multiple NIC cards without us having to deal with the
complexity (and expense of buying a new switch which supports bonding) for
doing bonding? That is, is it possible and a good idea to have CEPH OSDs be
set up to use specific NICs, so that we spread the load? (We read through
the recommendation of having different NICs for front-end traffic vs
back-end traffic, but we're not worried about network attacks -- so we're
thinking that just creating a "big" fat ethernet pipe gives us the most
flexibility.)
4. I'm a little confused -- does CEPH support incremental snapshots of
either VMs or the CEPH-FS? I saw in the release notes for "dumpling"
release (http://ceph.com/docs/master/release-notes/#v0-67-dumpling) this
statement: "The MDS now disallows snapshots by default as they are not
considered stable. The command ‘ceph mds set allow_snaps’ will enable
them." So, should I assume that we can't do incremental file-system
snapshots in a stable fashion until further notice?
-Sidharta
--
*Gautam Saxena *
President & CEO
Integrated Analysis Inc.
Making Sense of Data.™
Biomarker Discovery Software | Bioinformatics Services | Data Warehouse
Consulting | Data Migration Consulting
www.i-a-inc.com <http://www.i-a-inc.com/>
[email protected]
(301) 760-3077 office
(240) 479-4272 direct
(301) 560-3463 fax
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com