Also please search the ML door min_size=1. Curiously you're doing it with size=3.
On Mon, Jul 3, 2017, 8:33 AM Christian Balzer <[email protected]> wrote: > > Hello, > > On Mon, 3 Jul 2017 14:18:27 +0200 Mateusz Skała wrote: > > > @Christian ,thanks for quick answer, please look bellow. > > > > > -----Original Message----- > > > From: Christian Balzer [mailto:[email protected]] > > > Sent: Monday, July 3, 2017 1:39 PM > > > To: [email protected] > > > Cc: Mateusz Skała <[email protected]> > > > Subject: Re: [ceph-users] Cache Tier or any other possibility to > accelerate > > > RBD with SSD? > > > > > > > > > Hello, > > > > > > On Mon, 3 Jul 2017 13:01:06 +0200 Mateusz Skała wrote: > > > > > > > Hello, > > > > > > > > We are using cache-tier in Read-forward mode (replica 3) for > > > > accelerate reads and journals on SSD to accelerate writes. > > > > > > OK, lots of things wrong with this statement, but firstly, Ceph > version (it is > > > relevant) and more details about your setup and SSDs used would be > > > interesting and helpful. > > > > > > > Sorry about this. Ceph version 0.92.1 and we plan to upgrade to 10.2.0 > in short time. > > I'd never run (in production) one of the short term support versions like > Kraken, you're not getting ANY bug fixes there at all. > > But I guess this means that the dire warning when creating readforward > cache pools was only added with Jewel. > But the problem with that mode is of course present in all other versions > that have it. > > > About the configuration: > > 4 nodes, each node with: > > - 4x HDD WD Re 2TB WD2004FBYZ, > > - 2x SSD Intel S3610 200GB (one for journal and system with mon, second > for cache-tier). > > > > It gives 32TB RAW HDD space and only 600GB RAW SSD space, and I think it > is problem with small size of cache. > > > Don't "think" if you can quantify. Between iostat and the Ceph perf > counters you can determine how much data goes in and out of your cluster > and OSDs and how much you'd need to get through a typical day with I/O > mostly on your cache tier. > > That said, 200GB effective cache space is likely to be a bottleneck, yes. > > > > If you had searched the ML archives for readforward you'd come across a > > > very recent thread by me, in which the powers that be state that this > mode is > > > dangerous and not recommended. > > > During quite some testing with this mode I never encountered any > problems, > > > but consider yourself warned. > > > > > > Now readforward will FORWARD reads to the backing storage, so it will > > > NEVER accelerate reads (promote them to the cache-tier). > > > The only speedup you will see is for objects that have been previously > > > written and are still in the cache-tier. > > > > > > > Ceph osd pool ls detail > > pool 4 'ssd' replicated size 3 min_size 1 crush_ruleset 1 object_hash > rjenkins pg_num 128 pgp_num 128 last_change 88643 flags > hashpspool,incomplete_clones tier_of 5 cache_mode readforward target_bytes > 176093659136 hit_set bloom{false_positive_probability: 0.05, target_size: > 0, seed: 0} 120s x6 min_read_recency_for_promote 1 > min_write_recency_for_promote 1 stripe_width 0 > > removed_snaps > [1~14d,150~27,178~8,183~8,18c~12,1a0~22,1c4~4,1c9~1b] > > pool 5 'sata' replicated size 3 min_size 1 crush_ruleset 2 object_hash > rjenkins pg_num 512 pgp_num 512 last_change 88643 lfor 66807 flags > hashpspool tiers 4 read_tier 4 write_tier 4 stripe_width 0 > > removed_snaps > [1~14d,150~27,178~8,183~8,18c~12,1a0~22,1c4~4,1c9~1b] > > > > The setup has over 1 year. On ceph status I see flushing, promote and > evicting operations. Maybe it depends on my old version? > > > Nothing to do with your version as far as vulnerability to the problem is > concerned. > > And you see all the flushing etc because WRITES are going through your > cache-tier of course, as I stated above. > However if your goal is to cache reads, this is the wrong mode and in > general probably a bad fit for a small cache-tier. > > > Christian > > > > Using cache-tiers can work beautifully if you understand the I/O > patterns > > > involved (tricky on a cloud storage with very mixed clients), can make > your > > > cache-tier large enough to cover the hot objects (working set) or at > least (as > > > you are attempting) to segregate the read and write paths as much as > > > possible. > > > > > Have you got any good method to analyze workload? > > I found this script https://github.com/cernceph/ceph-scripts and try to > see reads and writes per length, but how to know it is random or sequential > io? > > > > > > We are using only RBD. Based > > > > on the ceph-docs, RBD have bad I/O pattern for cache tier. I'm > > > > looking for information about other possibility to accelerate reads > on > > > > RBD with SSD drives. > > > > > > > The documentation rightly warns about things, so people don't have > > > unrealistic expectations. However YOU need to look at YOUR loads, > patterns > > > and usage and then decide if it is beneficial or not. > > > > > > As I hinted above, analyze your systems, are the reads actually slow > or are > > > they slowed down by competing writes to the same storage? > > > > > > Cold reads (OSD server just rebooted, no cache has that object in it) > will > > > obviously not benefit from any scheme. > > > > > > Reads from the HDD OSDs can very much benefit by having enough RAM to > > > hold all the SLAB objects (direntry etc) in memory, so you can avoid > disk > > > access to actually find the object. > > > > > > Speeding up the actual data read you have the option of the cache-tier > (in > > > writeback mode, with proper promotion and retention configuration). > > > > > > Or something like bcache on the OSD servers, discussed here several > times > > > as well. > > > > > > > The second question, is it any cache tier mode, that replica can be > > > > set on 1, for best use of SSD space? > > > > > > > A cache-tier (the same true for any other real cache methods) will at > any > > > given time have objects in it that are NOT on the actual backing > storage when > > > it is used to cache writes. > > > So it needs to be just as redundant as the rest of the system, at > least a replica > > > of 2 with sufficiently small/fast SSDs. > > > > > > > OK, I understand. > > > > > With bcache etc just caching reads, you can get away with a single > replication > > > of course, however failing SSDs may then cause your cluster to melt > down. > > > > > > > I will search ML for this. > > > > > Christian > > > -- > > > Christian Balzer Network/Systems Engineer > > > [email protected] Rakuten Communications > > > > > > > > > -- > Christian Balzer Network/Systems Engineer > [email protected] Rakuten Communications > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
