As a general observation, the speed of calling stat() on any object in
ceph is relatively slow. I'm probably getting a rate of about 10K per
second using AIO, and even then it is really *really* bursty, to the
point where there could be 5 seconds of activity going in one
direction, then the callback thread wakes up and processes all queued
completions in a single blast.
At our current rate with more than 1 billion objects in a pool, it's
looking like if I was to check the existence of every object, that it
would take around 19-24 hours to complete.
Granted that our starting point before beginning some migrations to
Ceph was around 1 hour to check the existence of every object, this is
something of a concern. Are there any ways via librados to improve
the throughput of processing objects?
Adding more instances or sharding work doesn't seem to increase the
overall throughput at all. And cache won't help either, there is no
determinism in what's accessed, and given the size of the pool OS
filesystem cache is useless anyway.
*(p < e ? p++ : p) = (c & 0x0f) + '0';
ceph-users mailing list