On Thu, Nov 13, 2014 at 9:35 PM, Nir Soffer <nsof...@redhat.com> wrote: > ----- Original Message ----- >> From: "Francesco Romani" <from...@redhat.com> >> To: "engine-de...@ovirt.org" <devel@ovirt.org> >> Sent: Thursday, November 13, 2014 6:47:04 PM >> Subject: Re: [ovirt-devel] [VDSM] [JSONRPC] early, coarse grained benchmarks >> >> ----- Original Message ----- >> > From: "Piotr Kliczewski" <piotr.kliczew...@gmail.com> >> > To: "Francesco Romani" <from...@redhat.com> >> > Cc: "engine-de...@ovirt.org" <devel@ovirt.org> >> > Sent: Monday, November 10, 2014 4:06:38 PM >> > Subject: Re: [ovirt-devel] [VDSM] [JSONRPC] early, coarse grained >> > benchmarks >> > >> > Great work! >> > >> > On Mon, Nov 10, 2014 at 3:50 PM, Francesco Romani <from...@redhat.com> >> > wrote: >> > > Hi everyone, >> > > >> > > I'd like to share initial profiles of VDSM comparing >> > > XMLRPC and JSONRPC, plus two patches optimizing the latter. >> > > >> > > While the low-level profiler (yappi) is now working nicely, >> > > and this is a good building block, I'm not really satisfied about >> > > the testcase - altough it is a fair approximation of reality; >> > > most important, I'm not satisfied about benchmark driver, which >> > > is too coarse grained. >> > > >> > > The benchmark driver is a simple python script using the REST APIs >> > > of Engine. >> > > >> > > I'd like to be able to bypass Engine, but vdsClient is not good enough: >> > > no JSONRPC option yet. >> > > >> > >> > Once I have a bit o time will work on it. >> >> Well, it's on my list as well since long time so no pressure. >> >> I'm profiling four major flows at the moment, and improving tools along the >> way. >> >> Here's my series, still being worked on and not yet ready for review (still >> draft for this reason) >> >> http://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:profile_runtime,n,z >> >> Here we have >> 1. one patch to start/stop profiling at runtime using vdsClient > > Note that stopping yappi may segfault, as yappi is using unsafe iteration > on Python internal link list without locking. > > If we add such option, it should work only from local connection. > >> 2. one simple (simplistic?) benchmark runner script >> >> more will come as need arises. A proper wiki page about how to profile VDSM >> is planned as well. >> >> The flow which I'm testing are >> 1. VM startup >> 2. VM shutdown >> 3. VM list >> 4. VM getAllVmStats >> >> with focus on VDSM side, not all the flow; thus, I'm intentionally ruling out >> Engine from the picture. >> >> 1 and 2 because of https://bugzilla.redhat.com/show_bug.cgi?id=1155146 >> 3 and 4 because of https://bugzilla.redhat.com/show_bug.cgi?id=1110835 >> >> So far, the profile of #2 was the more puzzling, as no clear bottleneck >> emerged yet. A new deeper round >> is scheduled ASAP. >> I found optimizations that affect 1, 3, 4, especially with JSONRPC. >> Unfortunately, I can't yet test >> with vdsClient (see above). Will follow up on that ASAP. >> >> Patches, in order of readiness for review: >> >> http://gerrit.ovirt.org/#/c/35024/ >> http://gerrit.ovirt.org/#/c/34989/ >> http://gerrit.ovirt.org/#/c/35139 (DRAFT) >> http://gerrit.ovirt.org/#/c/35140 (DRAFT) >> >> Let's see the effect of these in the #3 flows. >> Scenario: >> RHEL 6.5 on minidell running VDSM master + patches, with 150 VMs each one 1 > > What is the cpu usage and load during the test? In previous tests you had nice > graph with this data. > >> VCPU, 16 MB (megs) RAM and 1 GB HDD >> on ISCSI. Run 300 times >> "vdsClient -s 0 list table" > > This is not a good test - you just overload vdsm. Instead, run this verb in a > more > typical rate (like engine does) and limit the test by the running time, not > by the > number of calls. > > Running like this will show that the patched version needs less cpu and > complete more > calls during the test duration (e.g. 5 mintues). > >> >> vanilla VDSM: >> average=6985ms min=3220ms max=11880ms >> response time: percentage >> <3000ms: 0.0 >> <3500ms: 8.3 >> <4000ms: 18.3 >> <4500ms: 23.0 >> <5000ms: 27.7 >> <5500ms: 31.0 >> <6000ms: 36.3 >> <6500ms: 41.0 >> <7000ms: 49.0 >> <7500ms: 54.7 >> <8000ms: 59.0 >> <8500ms: 66.0 >> <9000ms: 71.7 >> <9500ms: 79.7 >> <10000ms: 88.3 >> <10500ms: 96.3 >> <11000ms: 99.0 >> <11500ms: 99.7 >> <12000ms: 100.0 >> >> patched VDSM: > > What is "patched"? which patches are applied? > >> average=5761ms min=2830ms max=11450ms >> response time: percentage >> <3000ms: 0.3 >> <3500ms: 6.7 >> <4000ms: 20.7 >> <4500ms: 30.7 >> <5000ms: 39.7 >> <5500ms: 48.0 >> <6000ms: 56.3 >> <6500ms: 66.3 >> <7000ms: 72.3 >> <7500ms: 80.0 >> <8000ms: 87.7 >> <8500ms: 94.3 >> <9000ms: 97.7 >> <9500ms: 98.0 >> <10000ms: 98.7 >> <10500ms: 99.3 >> <11000ms: 99.7 >> <11500ms: 100.0 >> <12000ms: 100.0 >> >> See attachment for an oocalc spreadsheat with raw samples and graph >> >> profiles. >> Vanilla VDSM: >> 243874010 function calls (349761179 primitive calls) in 2612.654 >> seconds >> >> Ordered by: internal time >> List reduced from 883 to 20 due to restriction <20> >> >> ncalls tottime percall cumtime percall filename:lineno(function) >> 196905/49111117 500.567 0.003 1002.618 0.000 copy.py:144(deepcopy) >> 120996/4106203 217.146 0.002 998.929 0.000 >> copy.py:251(_deepcopy_dict) >> 39443308 192.752 0.000 217.518 0.000 copy.py:261(_keep_alive) >> 859 149.007 0.173 149.466 0.174 >> storage_mailbox.py:621(SPM_MailMonitor._handleRequests) >> 601/1891502 147.901 0.246 579.963 0.000 >> xmlrpclib.py:765(Marshaller.dump_struct) >> 65493648 128.666 0.000 169.473 0.000 string.py:511(replace) >> 7858586/32213416 95.784 0.000 209.567 0.000 >> encoder.py:284(JSONEncoder._iterencode) >> 21831216 89.865 0.000 259.338 0.000 xmlrpclib.py:167(escape) >> 7858587/20679836 80.741 0.000 196.755 0.000 >> encoder.py:213(JSONEncoder._iterencode_dict) >> 601/12515060 66.632 0.111 579.970 0.000 >> xmlrpclib.py:683(Marshaller.__dump) >> 227968 50.938 0.000 59.783 0.000 >> __init__.py:764(UserGroupEnforcingHandler.emit) >> 155279 45.445 0.000 5.345 0.000 posixpath.py:94(splitext) >> 133463 45.351 0.000 104.987 0.001 __init__.py:1175(Logger.handle) >> 7290900 45.228 0.000 149.552 0.000 >> xmlrpclib.py:744(Marshaller.dump_unicode) >> 601 43.538 0.072 1.056 0.002 >> socket.py:196(_socketobject.accept) >> 35095608 31.967 0.000 31.967 0.000 copy.py:197(_deepcopy_atomic) >> 50498 24.912 0.000 55.671 0.001 inspect.py:247(getmembers) >> 7479142/8217142 24.764 0.000 150.280 0.000 >> encoder.py:180(JSONEncoder._iterencode_list) >> 75750 19.871 0.000 75.819 0.001 >> vm.py:2271(Vm._getRunningVmStats) >> 4447686 17.603 0.000 17.603 0.000 >> sampling.py:323(AdvancedStatsFunction.interval) >> >> >> Patched VDSM: >> 165586271 function calls (181657572 primitive calls) in 2406.080 >> seconds >> >> Ordered by: internal time >> List reduced from 1045 to 20 due to restriction <20> >> >> ncalls tottime percall cumtime percall filename:lineno(function) >> 51229/51594 198.728 0.004 184.402 0.004 >> Monitor.py:182(GuestMonitor._should_run) >> 234244 183.784 0.001 570.998 0.002 >> vm.py:2274(Vm._getRunningVmStats) >> 647/1891502 173.119 0.268 671.222 0.000 >> xmlrpclib.py:765(Marshaller.dump_struct) >> 66247525/66247536 150.364 0.000 192.790 0.000 string.py:511(replace) >> 653 118.867 0.182 119.219 0.183 >> storage_mailbox.py:621(SPM_MailMonitor._handleRequests) >> 22052164/22082512 90.992 0.000 294.362 0.000 >> xmlrpclib.py:167(escape) >> 637/12640708 75.551 0.119 671.442 0.000 >> xmlrpclib.py:683(Marshaller.__dump) >> 5096229/5096259 66.229 0.000 64.007 0.000 >> utils.py:296(convertToStr) >> 7200900 49.677 0.000 161.784 0.000 >> xmlrpclib.py:744(Marshaller.dump_unicode) >> 1318320 39.141 0.000 39.141 0.000 vm.py:610(compute_latency) >> 57937/57940 37.975 0.001 80.458 0.001 inspect.py:247(getmembers) >> 234244 36.289 0.000 155.929 0.001 >> vm.py:519(VmStatsThread._getDiskStats) >> 234244 35.013 0.000 46.974 0.000 >> vm.py:2359(Vm._getGraphicsStats) >> 365102 24.819 0.000 106.858 0.000 glob.py:18(iglob) >> 234244 23.588 0.000 32.139 0.000 >> vm.py:402(VmStatsThread._getBalloonStats) >> 439440 22.295 0.000 22.295 0.000 vm.py:595(_calcDiskRate) >> 439440 22.178 0.000 61.320 0.000 vm.py:607(_calcDiskLatency) >> 834132 21.402 0.000 21.402 0.000 vm.py:1311(Vm._get_lastStatus) >> 1071684 19.455 0.000 24.374 0.000 pthreading.py:77(Lock.release) >> 2961805 18.907 0.000 61.027 0.000 >> xmlrpclib.py:737(Marshaller.dump_string) >> > > This look strange: > > vanila vdsm: 75750 19.871 0.000 75.819 0.001 > vm.py:2271(Vm._getRunningVmStats) > patched vdsm: 234244 183.784 0.001 570.998 0.002 > vm.py:2274(Vm._getRunningVmStats) > > The second profile is doing about 3 times more Vm._getRunningVmStats calls, > which takes 10 times more cpu time(?!). > > We don't have any info on the load - maybe vdsm is overloaded in both runs, > which > can explain why it does 1/3 of the work vs the patched version, but the > patched > version is probably overloaded as well, which can explain the much higher > cpu time. > > I suggest to check the cpu usage, and do *not* make profiles where vdsm uses > more > then 100% cpu. > > Lets profile much lower load to understand where time is taken. > >> Open points: >> * profile remaining flows again and report like this (in progress for >> getAllVmStats) >> * add wiki page >> * find out why there is a so high minimum price for list verb >> * profile JSONRPC (needs patches to vdsClient) > > Comparing jsonrpc would be very interesting - but we don't have a jsonrpc > client yet, so you better use the engine for this intead of vdsclient. >
I can help to reduce engine noise so we can profiles using both protocols. > > Nir > _______________________________________________ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel