Please check this PR: <a href="https://github.com/ceph/ceph/pull/5670"
target="_blank">https://github.com/ceph/ceph/pull/5670</a><br><br>Thanks.<br>Zhi
Zhang (David)<br><br><br>----------------------------------------<br>>
Date: Tue, 25 Aug 2015 06:10:19 -0700<br>> From: [email protected]<br>>
To: [email protected]<br>> CC: [email protected]<br>>
Subject: Re: Long tail latency due to journal aio io_submit takes long time to
return<br>><br>> On Tue, 25 Aug 2015, Z Zhang wrote:<br>> > Hi
Ceph-devel,<br>> ><br>> > Recently we find there are some (not
much) long tail latency due to<br>> > journal aio io_submit takes long
time to return. We added some logs in<br>> > the FileJouranl.cc and get
following output as an example. AIO's<br>> > io_submit function takes
~400ms to return, occasionally even takes up to<br>> > 900ms.<br>>
><br>> >
------------------------------------------------------------<br>> >
2015-08-25 15:24:39.535206 7f22699c2700 20 journal write_thread_entry woke
up<br>> > 2015-08-25 15:24:39.535212 7f22699c2700 10 journal room
10460594175 max_size 10484711424 pos 4786679808 header.start 4762566656 top
4096<br>> > 2015-08-25 15:24:39.535215 7f22699c2700 10 journal
check_for_full at 4786679808 : 532480 < 10460594175<br>> > 2015-08-25
15:24:39.535216 7f22699c2700 15 journal prepare_single_write 1 will write
4786679808 : seq 13075224 len 526537 -> 532480 (head 40 pre_pad 4046 ebl
526537 post_pad 1817 tail 40) (ebl alignment 4086)<br>> > 2015-08-25
15:24:39.535322 7f22699c2700 20 journal prepare_multi_write queue_pos now
4787212288<br>> > 2015-08-25 15:24:39.535324 7f22699c2700 15 journal
do_aio_write writing 4786679808~532480<br>> > 2015-08-25 15:24:39.535434
7f22699c2700 10 journal align_bl total memcopy: 532480<br>> > 2015-08-25
15:24:39.535439 7f22699c2700 20 journal write_aio_bl 4786679808~532480 seq
13075224<br>> > 2015-08-25 15:24:39.535442 7f22699c2700 20 journal
write_aio_bl .. 4786679808~532480 in 1<br>> > 2015-08-25 15:24:39.535444
7f22699c2700 5 journal io_submit starting...<br>> ><br>> >
2015-08-25 15:24:39.990372 7f22699c2700 5 journal io_submit return value:
1<br>> > 2015-08-25 15:24:39.990396 7f22699c2700 5 journal put_throttle
finished 1 ops and 526537 bytes, now 12 ops and 6318444 bytes<br>> >
2015-08-25 15:24:39.990401 7f22691c1700 20 journal write_finish_thread_entry
waiting for aio(s)<br>> > 2015-08-25 15:24:39.990406 7f22699c2700 20
journal write_thread_entry aio throttle: aio num 1 bytes 532480 ... exp 2
min_new 4 ... pending 6318444<br>> > 2015-08-25 15:24:39.990410
7f22699c2700 10 journal room 10460061695 max_size 10484711424 pos 4787212288
header.start 4762566656 top 4096<br>> > 2015-08-25 15:24:39.990413
7f22699c2700 10 journal check_for_full at 4787212288 : 532480 <
10460061695<br>> > 2015-08-25 15:24:39.990415 7f22699c2700 15 journal
prepare_single_write 1 will write 4787212288 : seq 13075225 len 526537 ->
532480 (head 40 pre_pad 4046 ebl 526537 post_pad 1817 tail 40) (ebl alignment
4086)<br>> >
------------------------------------------------------------<br>><br>>
This is probably the kernel throttling us because the request queue is<br>>
full / device is busy / whatever.<br>><br>> > And write_aio_bl will
hold aio_lock until it returns, so<br>> > write_finish_thread_entry can't
process completed aio when io_submit is<br>> > blocking. We reduced the
aio_lock's locking scope in<br>> > write_aio_bl, specifically not locking
io_submit. It doesn't seem to<br>> > need lock on io_submit in journal
because there is only 1<br>> > write_thread_entry and 1
write_finish_thread_entry.<br>> ><br>> > This might help those
completed aio to get processed asap. But for the<br>> > one blocked in
io_submit and the ones in the queue, they still need to<br>> > take more
time.<br>> ><br>> > Any idea about this?<br>><br>> That
sounds like the right fix! I don't think there is anything else to<br>> be
done.. if the kernel is throttling us it's generally for a good reason,<br>>
and there isn't anything else we can do.<br>><br>> Do you have a
patch?<br>><br>> sage<br>