Hey Samuel,
I had a good run on the production-cluster with it and unfortunately, it
still doesn't seem to have solved the problem. It seemed OK for a while
and individual OSD CPU-usage seemed quite low, but as the cluster's load
increased during the day, things got slower again. Write-performance
within a VM crawled to 30MB/sec and at some point, I got only 10MB/sec
on reads in that same VM. I also did RADOS bench-tests with `rados
--pool rbd bench 120 write' and those got several hundreds of MB's/sec
on the same cluster at the same time of day, so maybe the problem is
RBD-related. Is there any code in the OSD that could influence
RBD-performance alone? Do you know of any other significant changes to
the OSD between Cuttlefish and Dumpling that could result in this?
PS: I also did the same RADOS bench-tests on my test-cluster, both with
Cuttlefish and Dumpling without your fix and got almost identical
results. This confirms that the problem might be in RBD, as Mark suggested.
Regards,
Oliver
On 20-08-13 19:40, Samuel Just wrote:
> Can you try dumpling head without the option?
> -Sam
>
> On Tue, Aug 20, 2013 at 1:44 AM, Oliver Daudey <[email protected]> wrote:
>> Hey Mark,
>>
>> Sorry, but after some more tests I have to report that it only worked
>> partly. The load seems lower with "wip-dumpling-pglog-undirty" in
>> place, but the Cuttlefish-osd still seems significantly faster and even
>> with "wip-dumpling-pglog-undirty" in place, things slow down way too
>> much under load. Unfortunately, only my production-cluster seems busy
>> enough to actually show the problem clearly by slowing down. Below is
>> `perf top'-output, fresh from my production-cluster under it's regular
>> load:
>>
>> First, the 0.67.1-6-g0c4f2f3 osd with "osd debug pg log writeout =
>> false":
>> 16.53% [kernel] [k]
>> intel_idle
>> 6.47% libleveldb.so.1.9 [.]
>> 0x380a1
>> 5.76% [kernel] [k]
>> find_busiest_group
>> 4.11% libc-2.11.3.so [.]
>> memcmp
>> 3.95% kvm [.]
>> 0x1f6f31
>> 2.05% [kernel] [k]
>> default_send_IPI_mask_sequence_phys
>> 2.03% [kernel] [k]
>> _raw_spin_lock
>> 1.87% libleveldb.so.1.9 [.]
>> leveldb::InternalKeyComparator::Compar
>> 1.57% libc-2.11.3.so [.]
>> memcpy
>> 1.37% libleveldb.so.1.9 [.]
>> leveldb::Block::Iter::Next()
>> 1.26% [kernel] [k]
>> hrtimer_interrupt
>> 1.12% [kernel] [k]
>> __hrtimer_start_range_ns
>> 1.09% [kernel] [k]
>> native_write_cr0
>> 1.05% libstdc++.so.6.0.13 [.]
>> std::string::_M_mutate(unsigned long,
>> 1.00% [kernel] [k]
>> native_write_msr_safe
>> 0.99% [kernel] [k]
>> apic_timer_interrupt
>> 0.98% [kernel] [k]
>> clockevents_program_event
>> 0.96% [kernel] [k]
>> _raw_spin_unlock_irqrestore
>> 0.95% ceph-osd [.]
>> PGLog::undirty()
>> 0.92% [kernel] [k]
>> find_next_bit
>> 0.91% libsnappy.so.1.1.2 [.]
>> snappy::RawUncompress(snappy::Source*,
>> 0.88% [kernel] [k]
>> __schedule
>> 0.87% [kernel] [k]
>> cpumask_next_and
>> 0.84% [kernel] [k]
>> do_select
>> 0.80% [kernel] [k]
>> fget_light
>> 0.77% [kernel] [k]
>> reschedule_interrupt
>> 0.75% [kernel] [k]
>> _raw_spin_lock_irqsave
>> 0.62% libstdc++.so.6.0.13 [.] std::string::append(char
>> const*, unsig
>> 0.59% [kvm_intel] [k]
>> vmx_vcpu_run
>> 0.58% [kernel] [k]
>> copy_user_generic_string
>> 0.56% [kernel] [k]
>> load_balance
>> 0.54% [kernel] [k]
>> tg_load_down
>> 0.53% libpthread-2.11.3.so [.]
>> pthread_mutex_lock
>> 0.52% [kernel] [k] sync_inodes_sb
>>
>> Second, the 0.61.8 osd, under identical load:
>> 21.51% [kernel] [k]
>> intel_idle
>> 6.66% [kernel] [k]
>> find_busiest_group
>> 6.25% kvm [.]
>> 0x2d214b
>> 1.97% [kernel] [k]
>> _raw_spin_lock
>> 1.47% [kernel] [k]
>> native_write_msr_safe
>> 1.44% [kernel] [k]
>> hrtimer_interrupt
>> 1.37% [kernel] [k]
>> __hrtimer_start_range_ns
>> 1.34% [kernel] [k]
>> do_select
>> 1.29% [kernel] [k]
>> fget_light
>> 1.24% [kernel] [k]
>> clockevents_program_event
>> 1.21% [kernel] [k]
>> default_send_IPI_mask_sequence_phys
>> 1.18% [kernel] [k]
>> cpumask_next_and
>> 1.18% [kernel] [k]
>> _raw_spin_unlock_irqrestore
>> 1.15% [kernel] [k]
>> find_next_bit
>> 1.14% [kernel] [k]
>> __schedule
>> 1.11% [kernel] [k]
>> _raw_spin_lock_irqsave
>> 0.98% [kernel] [k]
>> apic_timer_interrupt
>> 0.86% [kernel] [k]
>> copy_user_generic_string
>> 0.83% [kernel] [k]
>> native_write_cr0
>> 0.76% [kernel] [k]
>> sync_inodes_sb
>> 0.71% [kernel] [k]
>> rcu_needs_cpu
>> 0.69% libpthread-2.11.3.so [.]
>> pthread_mutex_lock
>> 0.66% [kernel] [k]
>> fput
>> 0.62% [kernel] [k]
>> load_balance
>> 0.57% [vdso] [.]
>> 0x7fff3a976700
>> 0.56% libc-2.11.3.so [.]
>> memcpy
>> 0.56% [kernel] [k]
>> reschedule_interrupt
>> 0.56% [kernel] [k]
>> tg_load_down
>> 0.50% [kernel] [k] iput
>>
>> I see lots of new differences, but again don't know what to make of it
>> and what might be related or significant. LevelDB seems to jump out
>> this time, amongst others. Let me know if you need more info.
>>
>>
>> Regards,
>>
>> Oliver
>>
>> On ma, 2013-08-19 at 15:21 -0500, Mark Nelson wrote:
>>> Hi Oliver,
>>>
>>> Glad that helped! How much more efficient do the cuttlefish OSDs seem
>>> at this point (with wip-dumpling-pglog-undirty)? On modern Intel
>>> platforms we were actually hoping to see CPU usage go down in many cases
>>> due to the use of hardware CRC32 instructions.
>>>
>>> Mark
>>>
>>> On 08/19/2013 03:06 PM, Oliver Daudey wrote:
>>>> Hey Samuel,
>>>>
>>>> Thanks! I installed your version, repeated the same tests on my
>>>> test-cluster and the extra CPU-loading seems to have disappeared. Then
>>>> I replaced one osd of my production-cluster with your modified version
>>>> and it's config-option and it seems to be a lot less CPU-hungry now.
>>>> Although the Cuttlefish-osds still seem to be even more CPU-efficient,
>>>> your changes have definitely helped a lot. We seem to be looking in the
>>>> right direction, at least for this part of the problem.
>>>>
>>>> BTW, I ran `perf top' on the production-node with your modified osd and
>>>> didn't see anything osd-related stand out on top. "PGLog::undirty()"
>>>> was in there, but with much lower usage, right at the bottom of the
>>>> green part of the output.
>>>>
>>>> Many thanks for your help so far!
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Oliver
>>>>
>>>> On ma, 2013-08-19 at 00:29 -0700, Samuel Just wrote:
>>>>> You're right, PGLog::undirty() looks suspicious. I just pushed a
>>>>> branch wip-dumpling-pglog-undirty with a new config
>>>>> (osd_debug_pg_log_writeout) which if set to false will disable some
>>>>> strictly debugging checks which occur in PGLog::undirty(). We haven't
>>>>> actually seen these checks causing excessive cpu usage, so this may be
>>>>> a red herring.
>>>>> -Sam
>>>>>
>>>>> On Sat, Aug 17, 2013 at 2:48 PM, Oliver Daudey <[email protected]> wrote:
>>>>>> Hey Mark,
>>>>>>
>>>>>> On za, 2013-08-17 at 08:16 -0500, Mark Nelson wrote:
>>>>>>> On 08/17/2013 06:13 AM, Oliver Daudey wrote:
>>>>>>>> Hey all,
>>>>>>>>
>>>>>>>> This is a copy of Bug #6040 (http://tracker.ceph.com/issues/6040) I
>>>>>>>> created in the tracker. Thought I would pass it through the list as
>>>>>>>> well, to get an idea if anyone else is running into it. It may only
>>>>>>>> show under higher loads. More info about my setup is in the bug-report
>>>>>>>> above. Here goes:
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm running a Ceph-cluster with 3 nodes, each of which runs a mon, osd
>>>>>>>> and mds. I'm using RBD on this cluster as storage for KVM, CephFS is
>>>>>>>> unused at this time. While still on v0.61.7 Cuttlefish, I got 70-100
>>>>>>>> +MB/sec on simple linear writes to a file with `dd' inside a VM on this
>>>>>>>> cluster under regular load and the osds usually averaged 20-100%
>>>>>>>> CPU-utilisation in `top'. After the upgrade to Dumpling, CPU-usage for
>>>>>>>> the osds shot up to 100% to 400% in `top' (multi-core system) and the
>>>>>>>> speed for my writes with `dd' inside a VM dropped to 20-40MB/sec. Users
>>>>>>>> complained that disk-access inside the VMs was significantly slower and
>>>>>>>> the backups of the RBD-store I was running, also got behind quickly.
>>>>>>>>
>>>>>>>> After downgrading only the osds to v0.61.7 Cuttlefish and leaving the
>>>>>>>> rest at 0.67 Dumpling, speed and load returned to normal. I have
>>>>>>>> repeated this performance-hit upon upgrade on a similar test-cluster
>>>>>>>> under no additional load at all. Although CPU-usage for the osds wasn't
>>>>>>>> as dramatic during these tests because there was no base-load from
>>>>>>>> other
>>>>>>>> VMs, I/O-performance dropped significantly after upgrading during these
>>>>>>>> tests as well, and returned to normal after downgrading the osds.
>>>>>>>>
>>>>>>>> I'm not sure what to make of it. There are no visible errors in the
>>>>>>>> logs
>>>>>>>> and everything runs and reports good health, it's just a lot slower,
>>>>>>>> with a lot more CPU-usage.
>>>>>>>
>>>>>>> Hi Oliver,
>>>>>>>
>>>>>>> If you have access to the perf command on this system, could you try
>>>>>>> running:
>>>>>>>
>>>>>>> "sudo perf top"
>>>>>>>
>>>>>>> And if that doesn't give you much,
>>>>>>>
>>>>>>> "sudo perf record -g"
>>>>>>>
>>>>>>> then:
>>>>>>>
>>>>>>> "sudo perf report | less"
>>>>>>>
>>>>>>> during the period of high CPU usage? This will give you a call graph.
>>>>>>> There may be symbols missing, but it might help track down what the OSDs
>>>>>>> are doing.
>>>>>>
>>>>>> Thanks for your help! I did a couple of runs on my test-cluster,
>>>>>> loading it with writes from 3 VMs concurrently and measuring the results
>>>>>> at the first node with all 0.67 Dumpling-components and with the osds
>>>>>> replaced by 0.61.7 Cuttlefish. I let `perf top' run and settle for a
>>>>>> while, then copied anything that showed in red and green into this post.
>>>>>> Here are the results (sorry for the word-wraps):
>>>>>>
>>>>>> First, with 0.61.7 osds:
>>>>>>
>>>>>> 19.91% [kernel] [k] intel_idle
>>>>>> 10.18% [kernel] [k] _raw_spin_lock_irqsave
>>>>>> 6.79% ceph-osd [.] ceph_crc32c_le
>>>>>> 4.93% [kernel] [k]
>>>>>> default_send_IPI_mask_sequence_phys
>>>>>> 2.71% [kernel] [k] copy_user_generic_string
>>>>>> 1.42% libc-2.11.3.so [.] memcpy
>>>>>> 1.23% [kernel] [k] find_busiest_group
>>>>>> 1.13% librados.so.2.0.0 [.] ceph_crc32c_le_intel
>>>>>> 1.11% [kernel] [k] _raw_spin_lock
>>>>>> 0.99% kvm [.] 0x1931f8
>>>>>> 0.92% [igb] [k] igb_poll
>>>>>> 0.87% [kernel] [k] native_write_cr0
>>>>>> 0.80% [kernel] [k] csum_partial
>>>>>> 0.78% [kernel] [k] __do_softirq
>>>>>> 0.63% [kernel] [k] hpet_legacy_next_event
>>>>>> 0.53% [ip_tables] [k] ipt_do_table
>>>>>> 0.50% libc-2.11.3.so [.] 0x74433
>>>>>>
>>>>>> Second test, with 0.67 osds:
>>>>>>
>>>>>> 18.32% [kernel] [k] intel_idle
>>>>>> 7.58% [kernel] [k] _raw_spin_lock_irqsave
>>>>>> 7.04% ceph-osd [.] PGLog::undirty()
>>>>>> 4.39% ceph-osd [.] ceph_crc32c_le_intel
>>>>>> 3.92% [kernel] [k]
>>>>>> default_send_IPI_mask_sequence_phys
>>>>>> 2.25% [kernel] [k] copy_user_generic_string
>>>>>> 1.76% libc-2.11.3.so [.] memcpy
>>>>>> 1.56% librados.so.2.0.0 [.] ceph_crc32c_le_intel
>>>>>> 1.40% libc-2.11.3.so [.] vfprintf
>>>>>> 1.12% libc-2.11.3.so [.] 0x7217b
>>>>>> 1.05% [kernel] [k] _raw_spin_lock
>>>>>> 1.01% [kernel] [k] find_busiest_group
>>>>>> 0.83% kvm [.] 0x193ab8
>>>>>> 0.80% [kernel] [k] native_write_cr0
>>>>>> 0.76% [kernel] [k] __do_softirq
>>>>>> 0.73% libc-2.11.3.so [.] _IO_default_xsputn
>>>>>> 0.70% [kernel] [k] csum_partial
>>>>>> 0.68% [igb] [k] igb_poll
>>>>>> 0.58% [kernel] [k] hpet_legacy_next_event
>>>>>> 0.54% [kernel] [k] __schedule
>>>>>>
>>>>>>
>>>>>> What jumps right out, is the "PGLog::undirty()", which doesn't show up
>>>>>> with 0.61.7 at all, but is an extra drag right at top-usage in 0.67.
>>>>>> Note that I didn't manage to fully load the test-cluster CPU-wise,
>>>>>> because of network-constraints and I don't want to take any extra risks
>>>>>> on the production-cluster and test it there, but it seems we found a
>>>>>> possible culprit.
>>>>>>
>>>>>> Any ideas? Thanks again!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Oliver
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> [email protected]
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com