@Christian ,thanks for quick answer, please look bellow.
> -----Original Message-----
> From: Christian Balzer [mailto:[email protected]]
> Sent: Monday, July 3, 2017 1:39 PM
> To: [email protected]
> Cc: Mateusz Skała <[email protected]>
> Subject: Re: [ceph-users] Cache Tier or any other possibility to accelerate
> RBD with SSD?
>
>
> Hello,
>
> On Mon, 3 Jul 2017 13:01:06 +0200 Mateusz Skała wrote:
>
> > Hello,
> >
> > We are using cache-tier in Read-forward mode (replica 3) for
> > accelerate reads and journals on SSD to accelerate writes.
>
> OK, lots of things wrong with this statement, but firstly, Ceph version (it is
> relevant) and more details about your setup and SSDs used would be
> interesting and helpful.
>
Sorry about this. Ceph version 0.92.1 and we plan to upgrade to 10.2.0 in short
time.
About the configuration:
4 nodes, each node with:
- 4x HDD WD Re 2TB WD2004FBYZ,
- 2x SSD Intel S3610 200GB (one for journal and system with mon, second for
cache-tier).
It gives 32TB RAW HDD space and only 600GB RAW SSD space, and I think it is
problem with small size of cache.
> If you had searched the ML archives for readforward you'd come across a
> very recent thread by me, in which the powers that be state that this mode is
> dangerous and not recommended.
> During quite some testing with this mode I never encountered any problems,
> but consider yourself warned.
>
> Now readforward will FORWARD reads to the backing storage, so it will
> NEVER accelerate reads (promote them to the cache-tier).
> The only speedup you will see is for objects that have been previously
> written and are still in the cache-tier.
>
Ceph osd pool ls detail
pool 4 'ssd' replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins
pg_num 128 pgp_num 128 last_change 88643 flags hashpspool,incomplete_clones
tier_of 5 cache_mode readforward target_bytes 176093659136 hit_set
bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 120s x6
min_read_recency_for_promote 1 min_write_recency_for_promote 1 stripe_width 0
removed_snaps [1~14d,150~27,178~8,183~8,18c~12,1a0~22,1c4~4,1c9~1b]
pool 5 'sata' replicated size 3 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 512 pgp_num 512 last_change 88643 lfor 66807 flags hashpspool tiers 4
read_tier 4 write_tier 4 stripe_width 0
removed_snaps [1~14d,150~27,178~8,183~8,18c~12,1a0~22,1c4~4,1c9~1b]
The setup has over 1 year. On ceph status I see flushing, promote and evicting
operations. Maybe it depends on my old version?
> Using cache-tiers can work beautifully if you understand the I/O patterns
> involved (tricky on a cloud storage with very mixed clients), can make your
> cache-tier large enough to cover the hot objects (working set) or at least (as
> you are attempting) to segregate the read and write paths as much as
> possible.
>
Have you got any good method to analyze workload?
I found this script https://github.com/cernceph/ceph-scripts and try to see
reads and writes per length, but how to know it is random or sequential io?
> > We are using only RBD. Based
> > on the ceph-docs, RBD have bad I/O pattern for cache tier. I'm
> > looking for information about other possibility to accelerate reads on
> > RBD with SSD drives.
> >
> The documentation rightly warns about things, so people don't have
> unrealistic expectations. However YOU need to look at YOUR loads, patterns
> and usage and then decide if it is beneficial or not.
>
> As I hinted above, analyze your systems, are the reads actually slow or are
> they slowed down by competing writes to the same storage?
>
> Cold reads (OSD server just rebooted, no cache has that object in it) will
> obviously not benefit from any scheme.
>
> Reads from the HDD OSDs can very much benefit by having enough RAM to
> hold all the SLAB objects (direntry etc) in memory, so you can avoid disk
> access to actually find the object.
>
> Speeding up the actual data read you have the option of the cache-tier (in
> writeback mode, with proper promotion and retention configuration).
>
> Or something like bcache on the OSD servers, discussed here several times
> as well.
>
> > The second question, is it any cache tier mode, that replica can be
> > set on 1, for best use of SSD space?
> >
> A cache-tier (the same true for any other real cache methods) will at any
> given time have objects in it that are NOT on the actual backing storage when
> it is used to cache writes.
> So it needs to be just as redundant as the rest of the system, at least a
> replica
> of 2 with sufficiently small/fast SSDs.
>
OK, I understand.
> With bcache etc just caching reads, you can get away with a single replication
> of course, however failing SSDs may then cause your cluster to melt down.
>
I will search ML for this.
> Christian
> --
> Christian Balzer Network/Systems Engineer
> [email protected] Rakuten Communications
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com