Using rados benchmark. It's just a test pool anyway. I will stick with my current OSD setup (16HDDs 4 SSDs for a 1:4 ration of SSD to HDD). I can get > 800 MB/s write and about 1GB read.
On Mon, Apr 20, 2015 at 11:19 AM, Mark Nelson <[email protected]> wrote: > How are you measuring the 300MB/s and 184MB/s? IE is it per drive, or the > client throughput? Also what controller do you have? We've seen some > controllers from certain manufacturers start to top out at around 1-2GB/s > with write cache enabled. > > Mark > > On 04/20/2015 11:15 AM, Barclay Jameson wrote: > >> I have a SSD pool for testing (only 8 Drives) but when I do a 1 SSD with >> journal and 1 SSD with Data I get > 300 MB/s write. When I change all 8 >> Disks to house the journal I get < 184MB/s write. >> >> >> On Mon, Apr 20, 2015 at 10:16 AM, Mark Nelson <[email protected] >> <mailto:[email protected]>> wrote: >> >> The big question is how fast these drives can do O_DSYNC writes. >> The basic gist of this is that for every write to the journal, an >> ATA_CMD_FLUSH call is made to ensure that the device (or potentially >> the controller) know that this data really needs to be stored safely >> before the flush is acknowledged. How this gets handled is really >> important. >> >> 1) If devices have limited or no power loss protection, they need to >> flush the contents of any caches to non-volatile memory. How >> quickly this can happen depends on a lot of factors, but even on >> SSDs may be slow enough to limit performance greatly relative how >> quickly writes can proceed if uninterrupted. >> >> * It's very important to note that some devices that lack power loss >> protection may simply *ignore* ATA_CMD_FLUSH and return immediately >> so as to appear fast, even though this means that data may become >> corrupt. Be very careful putting journals on devices that do this! >> >> ** Some devices that have claimed to have power loss protection >> don't actually have capacitors big enough to flush data from cache. >> This has lead to huge amounts of confusion and you have to be very >> careful. For a specific example see the section titled "The Truth >> About Micron's Power-Loss Portection" here: >> >> http://www.anandtech.com/show/8528/micron-m600-128gb-256gb-1tb-ssd-review-nda-placeholder >> >> 2) Devices that feature proper power loss protection such that >> caches can be flushed in the event of power failure can safely >> ignore ATA_CMD_FLUSH and return immediately when ATA_CMD_FLUSH is >> called. This greatly improves the performance of ceph journal >> writes and usually allows the journal to perform at or near the >> theoretical sequential write performance of the device. >> >> 3) Some controllers may be able to intercept these calls and return >> immediately on ATA_CMD_FLUSH if they have an on-board BBU that >> functions in the same way as PLP on the drives would. Unfortunately >> on many controllers this is tied to enabling writeback cache and >> running the drives in some kind of RAID mode (single-disk RAID0 LUNs >> are often used for Ceph OSDs with this kind of setup). In some >> cases the controller itself can become a bottleneck with SSDs so >> it's important to test this out and make sure it works well in >> practice. >> >> Regarding the 840 EVO, it sounds like based on user reports that it >> does not have PLP and does flush data on ATA_CMD_FLUSH resulting in >> quite a bit slower performance when doing O_DSYNC writes. >> Unfortunately we don't have any in the lab we can test, but likely >> this is why you are seeing slower write performance on them when >> journals are placed on the SSD. >> >> Mark >> >> On 04/20/2015 09:48 AM, J-P Methot wrote: >> >> My journals are on-disk, each disk being a SSD. The reason I >> didn't go >> with dedicated drives for journals is that when designing the >> setup, I >> was told that having dedicated journal SSDs on a full-SSD setup >> would >> not give me performance increases. >> >> So that makes the journal disk to data disk ratio 1:1. >> >> The replication size is 3, yes. The pools are replicated. >> >> On 4/20/2015 10:43 AM, Barclay Jameson wrote: >> >> Are your journals on separate disks? What is your ratio of >> journal >> disks to data disks? Are you doing replication size 3 ? >> >> On Mon, Apr 20, 2015 at 9:30 AM, J-P Methot >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> >> wrote: >> >> Hi, >> >> This is similar to another thread running right now, >> but since our >> current setup is completely different from the one >> described in >> the other thread, I thought it may be better to start a >> new one. >> >> We are running Ceph Firefly 0.80.8 (soon to be upgraded >> to >> 0.80.9). We have 6 OSD hosts with 16 OSD each (so a >> total of 96 >> OSDs). Each OSD is a Samsung SSD 840 EVO on which I can >> reach >> write speeds of roughly 400 MB/sec, plugged in jbod on a >> controller that can theoretically transfer at 6gb/sec. >> All of that >> is linked to openstack compute nodes on two bonded >> 10gbps links >> (so a max transfer rate of 20 gbps). >> >> When I run rados bench from the compute nodes, I reach >> the network >> cap in read speed. However, write speeds are vastly >> inferior, >> reaching about 920 MB/sec. If I have 4 compute nodes >> running the >> write benchmark at the same time, I can see the number >> plummet to >> 350 MB/sec . For our planned usage, we find it to be >> rather slow, >> considering we will run a high number of virtual >> machines in there. >> >> Of course, the first thing to do would be to transfer >> the journal >> on faster drives. However, these are SSDs we're talking >> about. We >> don't really have access to faster drives. I must find >> a way to >> get better write speeds. Thus, I am looking for >> suggestions as to >> how to make it faster. >> >> I have also thought of options myself like: >> -Upgrading to the latest stable hammer version (would >> that really >> give me a big performance increase?) >> -Crush map modifications? (this is a long shot, but I'm >> still >> using the default crush map, maybe there's a change >> there I could >> make to improve performances) >> >> Any suggestions as to anything else I can tweak would >> be strongly >> appreciated. >> >> For reference, here's part of my ceph.conf: >> >> [global] >> auth_service_required = cephx >> filestore_xattr_use_omap = true >> auth_client_required = cephx >> auth_cluster_required = cephx >> osd pool default size = 3 >> >> >> osd pg bits = 12 >> osd pgp bits = 12 >> osd pool default pg num = 800 >> osd pool default pgp num = 800 >> >> [client] >> rbd cache = true >> rbd cache writethrough until flush = true >> >> [osd] >> filestore_fd_cache_size = 1000000 >> filestore_omap_header_cache_size = 1000000 >> filestore_fd_cache_random = true >> filestore_queue_max_ops = 5000 >> journal_queue_max_ops = 1000000 >> max_open_files = 1000000 >> osd journal size = 10000 >> >> -- >> ====================== >> Jean-Philippe Méthot >> Administrateur système / System administrator >> GloboTech Communications >> Phone: 1-514-907-0050 <tel:1-514-907-0050> >> <tel:1-514-907-0050 <tel:1-514-907-0050>> >> Toll Free: 1-(888)-GTCOMM1 >> Fax: 1-(514)-907-0750 <tel:1-%28514%29-907-0750> >> <tel:1-%28514%29-907-0750> >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> http://www.gtcomm.net >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> >> -- >> ====================== >> Jean-Philippe Méthot >> Administrateur système / System administrator >> GloboTech Communications >> Phone: 1-514-907-0050 <tel:1-514-907-0050> >> Toll Free: 1-(888)-GTCOMM1 >> Fax: 1-(514)-907-0750 <tel:1-%28514%29-907-0750> >> [email protected] <mailto:[email protected]> >> http://www.gtcomm.net >> >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >>
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
