Hi,
>> So, AFAICT, the bulk of the write would be writing out the pgmap to
>> disk every second or so.
>
> It should be writing out the full map only every N commits... see 'paxos
> stash full interval', which defaults to 25.
But doesn't it also write it in full when there is a new pgmap ?
I have a new one about every second and its size * period seemed to
match the IO rate pretty well which it why I thought it was the reason
for the IO.
>> Is it really needed to write it in full ? It doesn't change all that
>> much AFAICT, so writing incremental changes with only periodic flush
>> might be a better option ?
>
> Right. It works this way now only because we haven't fully transitioned
> from the old scheme. The next step is to store the PGMap over lots of
> leveldb keys (one per pg) so that there is no big encode/decode of the
> entire PGMap structure...
Makes sense. I'm not sure of the "per-key" overhead of leveldb though,
in case where there are lots ( > 10k ) PGs.
Cheers,
Sylvain
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com