On Wed, May 21, 2014 at 11:23 PM, Sage Weil <s...@inktank.com> wrote:
> On Wed, 21 May 2014, Haomai Wang wrote:
>> I pushed the commit to fix this 
>> problem(https://github.com/ceph/ceph/pull/1848).
>>
>> With test program(Each sync request is issued with ten write request),
>> a significant improvement is noticed.
>>
>> aio_flush                          sum: 914750     avg: 1239   count:
>> 738      max: 4714   min: 1011
>> flush_set                          sum: 904200     avg: 1225   count:
>> 738      max: 4698   min: 999
>> flush                              sum: 641648     avg: 173    count:
>> 3690     max: 1340   min: 128
>>
>> Compared to last mail, it reduce each aio_flush request to 1239 ns
>> instead of 24145 ns.
>
> Good catch!  That's a great improvement.
>
> The patch looks clearly correct.  We can probably do even better by
> putting the Objects on a list when they get the first dirty buffer so that
> we only cycle through the dirty ones.  Or, have a global list of dirty
> buffers (instead of dirty objects -> dirty buffers).

Yes, I think the dirty objects list is worth to maintain.

According to my test: create a 1TB volume and has 500GB DB data, I run
the test program covered 100GB hot data. So in the test, we have
nearly 3000 objects. I calculate the time flush_set consumed and
compared it to the total time which get from "ceph perf
dump"(aio_flush_latency). The flush_set loop usually consumes 1-4ms,
and the total flush request consumes 3-8ms. The proportion is still
high.

Sample logs:
2014-05-26 22:11:16.392538 7fa6c6e41700  0 objectcacher flush_set
total: 28952 flushed: 4
2014-05-26 22:11:16.398726 7fa6c6e41700  0 objectcacher flush_set: time 0.004436
2014-05-26 22:11:16.398745 7fa6c6e41700  0 objectcacher flush_set
total: 28955 flushed: 15
2014-05-26 22:11:16.400992 7fa6c6e41700  0 objectcacher flush_set: time 0.002149
2014-05-26 22:11:17.039226 7fa6c6e41700  0 objectcacher flush_set: time 0.001304
2014-05-26 22:11:17.039274 7fa6c6e41700  0 objectcacher flush_set
total: 28950 flushed: 4
2014-05-26 22:11:17.045580 7fa6c6e41700  0 objectcacher flush_set: time 0.004523
2014-05-26 22:11:17.045636 7fa6c6e41700  0 objectcacher flush_set
total: 28959 flushed: 15

I think a request consuming 4ms in client side is still unaffordable.
And if I create a 2 or 3TB volume, the time will be more larger.

>
> sage
>
>>
>> I hope it's the root cause for db on rbd performance.
>>
>> On Wed, May 21, 2014 at 6:15 PM, Haomai Wang <haomaiw...@gmail.com> wrote:
>> > Hi all,
>> >
>> > I remember there exists discuss about DB(mysql) performance on rbd.
>> > Recently I test mysql-bench with rbd and found awful performance. So I
>> > dive into it and find that main cause is "flush" request from guest.
>> > As we know, applications such as mysql, ceph has own journal for
>> > durable and journal usually send sync&direct io. If fs barrier is on,
>> > each sync io operation make kernel issue "sync"(barrier) request to
>> > block device. Here, qemu will call "rbd_aio_flush" to apply.
>> >
>> > Via systemtap, I found a amazing thing:
>> > aio_flush                          sum: 4177085    avg: 24145  count:
>> > 173      max: 28172  min: 22747
>> > flush_set                          sum: 4172116    avg: 24116  count:
>> > 173      max: 28034  min: 22733
>> > flush                              sum: 3029910    avg: 4      count:
>> > 670477   max: 1893   min: 3
>> >
>> > This statistic info is gathered in 5s. Most of consuming time is on
>> > "ObjectCacher::flush". What's more, with time increasing, the flush
>> > count will be increasing.
>> >
>> > After view source, I find the root cause is "ObjectCacher::flush_set",
>> > it will iterator the "object_set" and look for dirty buffer. And
>> > "object_set"  contains all objects ever opened.  For example:
>> >
>> > 2014-05-21 18:01:37.959013 7f785c7c6700  0 objectcacher flush_set
>> > total: 5919 flushed: 5
>> > 2014-05-21 18:01:37.999698 7f785c7c6700  0 objectcacher flush_set
>> > total: 5919 flushed: 5
>> > 2014-05-21 18:01:38.038405 7f785c7c6700  0 objectcacher flush_set
>> > total: 5920 flushed: 5
>> > 2014-05-21 18:01:38.080118 7f785c7c6700  0 objectcacher flush_set
>> > total: 5920 flushed: 5
>> > 2014-05-21 18:01:38.119792 7f785c7c6700  0 objectcacher flush_set
>> > total: 5921 flushed: 5
>> > 2014-05-21 18:01:38.162004 7f785c7c6700  0 objectcacher flush_set
>> > total: 5922 flushed: 5
>> > 2014-05-21 18:01:38.202755 7f785c7c6700  0 objectcacher flush_set
>> > total: 5923 flushed: 5
>> > 2014-05-21 18:01:38.243880 7f785c7c6700  0 objectcacher flush_set
>> > total: 5923 flushed: 5
>> > 2014-05-21 18:01:38.284399 7f785c7c6700  0 objectcacher flush_set
>> > total: 5923 flushed: 5
>> >
>> > These logs record the iteration info, the loop will check 5920 objects
>> > but only 5 objects are dirty.
>> >
>> > So I think the solution is make "ObjectCacher::flush_set" only
>> > iterator the objects which is dirty.
>> >
>> > --
>> > Best Regards,
>> >
>> > Wheat
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to