Hello,
Ceph 0.94.5 for the record.
As some may remember, I phased in a 2TB cache tier 5 weeks ago.
About now it has reached about 60% usage, which is what I have the
cache_target_dirty_ratio set to.
And for the last 3 days I could see some writes (op_in_bytes) to the
backing storage (aka HDD pool), which hadn't seen any write action for the
aforementioned 5 weeks.
Alas my graphite dashboard showed no flushes (tier_flush), whereas
tier_promote on the cache pool could always be matched more or less to
op_out_bytes on the HDD pool.
The documentation (RH site) just parrots the names of the various perf
counters, so no help there. OK, lets look a what we got:
---
"tier_promote": 49776,
"tier_flush": 0,
"tier_flush_fail": 0,
"tier_try_flush": 558,
"tier_try_flush_fail": 0,
"agent_flush": 558,
"tier_evict": 0,
"agent_evict": 0,
---
Lots of promotions, that's fine.
Not a single tier_flush, er. wot? So what does this denote then?
OK, clearly tier_try_flush and agent_flush are where the flushing is
actually recorded (in my test cluster they differ, as I have run that
against the wall several times).
No evictions yet, that will happen at 90% usage.
So now I changed the graph data source for flushes to tier_try_flush,
however that does not match most of the op_in_bytes (or any other counter I
tried!) on the HDDs.
As, in there are flushes but no activity on the HDD OSDs as far as Ceph
seems to be concerned.
I can however match the flushes to actual disk activity on the HDDs
(gathered by collectd), which are otherwise totally dormant.
Can somebody shed some light on this, is it a known problem, in need of
a bug report?
Christian
--
Christian Balzer Network/Systems Engineer
[email protected] Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com