Hi, Sangho

Thanks for the info! BTW, gprof has one drawback - it can't profile 
multithreaded applications, i.e. it gives wrong results. I recommend using
Intel VTune Amplifier XE, it's very useful tool, you can use it without 
recompiling qemu, it'll show you everything - stack traces, profiles, it even
shows SMP friendlyness.

I've also done some more digging into this aio thing. Here's what I found, in 
main-loop.c:os_host_main_loop_wait (win32 dependent code)
if I replace:

select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0);

with:

qemu_mutex_unlock_iothread();
select_ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv0);
qemu_mutex_lock_iothread();

then the problem almost entirely cured for portio (vga), but for mmio it's 
still present (vigs). So, the problem here is a livelock
between main thread and io thread. I'm currently studying the mmio part, i.e. 
we probably need to stick these
qemu_mutex_unlock_iothread()/qemu_mutex_lock_iothread() somewhere else.

Another thing that looks strange to me is the fact that adding 
qemu_mutex_unlock_iothread()/qemu_mutex_lock_iothread()
makes so much difference, the thing is 'tv0' in that select is always 0, this 
means "poll and return immediately" and this select
actually returns immediately, so why does unlock/lock makes so much difference 
? I mean, if tv was > 0 then yes, main thread waits
on selects, io thread livelocks on mutex, this makes sense, but not when tv is 
0... I'm also studying this...

On 04/10/2014 10:48 AM, 박상호 wrote:
> Hi, Seokyeon and Stanislav
> 
>  
> 
> I profiled the qemu in windows by using gprof (-pg). I run the emulator until 
> I show the menu screen and then shutdown. It takes about 70 seconds. Please 
> check the attached result.
> 
>  
> 
> - Top ranks
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  16.48      1.05     1.05                             maru_vga_draw_line32_32
>  11.62      1.79     0.74                             __udivdi3
>   6.75      2.22     0.43                             os_host_main_loop_wait
>   5.65      2.58     0.36                             aio_ctx_prepare
>   5.34      2.92     0.34 111422776     0.00     0.00  qemu_mutex_unlock
>   5.34      3.26     0.34                             aio_ctx_check
>   5.18      3.59     0.33  8507037     0.00     0.00  slirp_pollfds_poll
>   3.77      3.83     0.24  8506993     0.00     0.00  slirp_pollfds_fill
>   3.14      4.03     0.20 76396706     0.00     0.00  timerlist_deadline_ns
>   2.67      4.20     0.17 25465512     0.00     0.00  
> timerlistgroup_deadline_ns
>   2.51      4.36     0.16                             __umoddi3
>   2.35      4.51     0.15  8506948     0.00     0.00  main_loop_wait
>   2.20      4.65     0.14 68485894     0.00     0.00  qemu_clock_get_ns
>   2.04      4.78     0.13  8507043     0.00     0.00  
> qemu_clock_run_all_timers
>   1.88      4.90     0.12 103165614     0.00     0.00  qemu_mutex_lock
>   1.88      5.02     0.12 25664993     0.00     0.00  timerlist_run_timers
> 
> Many functions related with aio and timerlist are too frequently as you have 
> expected.
> 
> According to the call graph (from 1714 lines),
> 
> -----------------------------------------------
>                                   42             aio_poll <cycle 1> [94]
>                              8506969             main_loop_wait <cycle 1> [8]
>                 0.36    0.23 8458958/33329095     aio_ctx_check [4]
>                 0.36    0.23 8499543/33329095     aio_ctx_prepare [3]
> [16]     2.7    0.17    0.00 25465512         timerlistgroup_deadline_ns 
> <cycle
>                              76396706             timerlist_deadline_ns 
> <cycle 1
> -----------------------------------------------
> main_loop_wait(), aio_ctx_check() and aio_ctx_prepare() call 
> timerlistgroup_deadline_ns() almouse evenly.
> 
> aio_ctx_check() and aio_ctx_prepare() are used for GSourceFuncs and we can 
> reasonably suspect the aio implementation for win32.
> 
> main_loop_wait() also calls excessively timerlistgroup_deadline_ns().
> 
>  
> 
> I have tested it in my ubuntu box. I run the emulator until I show the menu 
> screen and then shutdown. It takes about 20 seconds. Just compare the number 
> of calls. (25465512 per 70 seconds vs 78696 per 20 seconds )
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>   9.09      0.04     0.04      642     0.06     0.08  vga_update_display
>   6.82      0.07     0.03    32540     0.00     0.00  main_loop_wait
>   6.82      0.10     0.03    30701     0.00     0.00  phys_page_set_level
>   4.55      0.12     0.02  5883501     0.00     0.00  
> address_space_translate_in
>   4.55      0.14     0.02  5883382     0.00     0.00  address_space_translate
>   4.55      0.16     0.02   189067     0.00     0.00  cpu_get_clock_locked
>   4.55      0.18     0.02      831     0.02     0.02  
> qcow2_check_metadata_overl
>   4.55      0.20     0.02                             aio_ctx_prepare
>   2.27      0.21     0.01  5952765     0.00     0.00  phys_page_find
>   2.27      0.22     0.01  5835718     0.00     0.00  qemu_get_ram_block
>   2.27      0.23     0.01  1177955     0.00     0.00  qemu_mutex_lock
> ...
> 
>   0.00      0.44     0.00   236252     0.00     0.00  timerlist_deadline_ns
> 
> ...
> 
> -----------------------------------------------
>                 0.00    0.00      42/78696       aio_poll <cycle 1> [70]
>                 0.00    0.00   19116/78696       aio_ctx_check [34]
>                 0.00    0.00   26975/78696       aio_ctx_prepare [21]
>                 0.00    0.00   32563/78696       main_loop_wait [5]
> [60]     2.1    0.00    0.01   78696         timerlistgroup_deadline_ns [60]
>                 0.00    0.01  236252/236252      timerlist_deadline_ns [59]
> -----------------------------------------------
> ...
> 
>  
> 
> In summary, the aio implementation for win32 may be the reason and, however, 
> I still don't know exactly. I need to think about the result more and check 
> the aio implementation.
> 
>  
> 
> ------- *Original Message* -------
> 
> *Sender* : 박상호<[email protected]> 수석/파트장/Core파트/에스코어
> 
> *Date* : 2014-04-09 16:46 (GMT+09:00)
> 
> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance
> 
>  
> 
> Hi, Seokyeon Hwang
> 
> I’m afraid that the same performance degradation can happen in qemu 2.0 that 
> will be released at Apr. 10. (http://wiki.qemu.org/Planning/2.0)
> 
> I think that we need to dig more this issue until next week. J
> 
> *From:*SeokYeon Hwang [mailto:[email protected]]
> *Sent:* Wednesday, April 09, 2014 3:11 PM
> *To:* Stanislav Vorobiov; [email protected]; 박상호
> *Subject:* Re: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance
> 
>  
> 
> @ stanislav,
> 
> I see. You didn't want to apply W/A patch.
> 
> And... yes, we should study win32-aio.c in more detail.
> 
>  
> 
> I didn't test 3.12 kernel on Windows host yet. I should try it.
> 
>  
> 
> @ sangho and all,
> 
> How about your opinion?
> 
>  
> 
>  
> 
> ------- *Original Message* -------
> 
> *Sender*: Stanislav Vorobiov<[email protected] 
> <mailto:[email protected]>> Expert Engineer/SRR-Tizen S/W Group/삼성전자
> 
> *Date*: 2014-04-08 19:09 (GMT+09:00)
> 
> *Title*: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance
> 
>  
> 
> Hi, Seokyeon
> 
>> Yesterday, I looked up the related code and tested it.
>> But, I am not quite sure about the changed timer code in QEMU.
>>
>> the problem is disappeared by Stanislav's patch. However, I think adding 
>> dummy notifier from timerlist registration is better than checking 
>> use_icount according to the current changed timer logic. I'm not 100% sure 
>> about this.
> I've tried the patch, it looks like the fix is almost the same as mine in 
> terms of performance, i.e. it makes things better, but not as good as in 1.6. 
> And the difference
> is big, with 1.6 performance was much better. IMHO we didn't fix the problem 
> yet and this patch or mine shouldn't be applied. I'll try to look at this 
> problem again taking
> this patch into account, I really hope that we'll find the right solution for 
> this...
> 
>>
>>
>> If anyone knows about the following, please answer me.
>>
>> 1. Main-loop registers aio_notify to use own timers. Why do 6 timerlist, 
>> which are created by init_clocks() function in CPU thread and IO thread, 
>> eventually call aio_notify? aio_notify is called because there is no 
>> notifier registration explicitly.
>>
>> 2. The same above timer logic is performed in linux and Windows, but it is 
>> slow in Windows. What is the major cause of performance decline in Windows?
> It might be that aio logic broke for windows, i.e. misuse of IoCompletion api 
> or something, m.b. we should study win32-aio.c in more detail ?
> 
> Also, I noticed one more thing, it may be related to this problem. mobile 
> image doesn't boot with kernel 3.12 at all on windows, it hangs somewhere in
> network initialization (not 100% sure), that place also causes a little delay 
> with 3.4 kernel, but with 3.12 it never gets pass it. I've tried this both 
> without and
> with this patch. Also, Tizen IVI doesn't have this problem, it boots fine.
> 
> On 04/08/2014 11:19 AM, SeokYeon Hwang wrote:
>> Sorry, my attachment was missing.
>> 
>>  
>> 
>> Thanks.
>> 
>>  
>> 
>> ------- *Original Message* -------
>> 
>> *Sender* : 황석연 수석보/VM파트/에스코어
>> 
>> *Date* : 2014-04-08 16:11 (GMT+09:00)
>> 
>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance
>> 
>>  
>> 
>> Hi, everyone.
>> 
>>  
>> 
>> Sorry for late reply.
>> 
>> Yesterday, I looked up the related code and tested it.
>> But, I am not quite sure about the changed timer code in QEMU.
>> 
>> the problem is disappeared by Stanislav's patch. However, I think adding 
>> dummy notifier from timerlist registration is better than checking 
>> use_icount according to the current changed timer logic. I'm not 100% sure 
>> about this.
>> 
>> 
>> If anyone knows about the following, please answer me.
>> 
>> 1. Main-loop registers aio_notify to use own timers. Why do 6 timerlist, 
>> which are created by init_clocks() function in CPU thread and IO thread, 
>> eventually call aio_notify? aio_notify is called because there is no 
>> notifier registration explicitly.
>> 
>> 2. The same above timer logic is performed in linux and Windows, but it is 
>> slow in Windows. What is the major cause of performance decline in Windows?
>> 
>> 
>> I'll apply Stanislav's patch or the "dummy_notifier patch" attached as 
>> workaround If I cannot figure it out until this week.
>> If you have any comment about this, please let me know.
>> 
>>  
>> 
>> Thanks.
>> 
>>  
>> 
>> ============================================================================================
>> 
>> *Sender*: Seokyeon Hwang>
>> 
>> *Date*: 2014-03-14 10:35 (GMT+09:00)
>> 
>> *Title*: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance
>> 
>>  
>> 
>> Great job, thanks.
>> 
>>  
>> 
>> I should test with "vanilla QEMU 1.6" on windows.
>> 
>> I think it could be our mis-use QEMU timer API, or some other mistake on 
>> tizen specific devices.
>> 
>> I will test it until next week.
>> 
>>  
>> 
>> ------- *Original Message* -------
>> 
>> *Sender*: Stanislav Vorobiov> Expert Engineer/SRR-Tizen S/W Group/삼성전자
>> 
>> *Date*: 2014-03-14 02:22 (GMT+09:00)
>> 
>> *Title*: Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance
>> 
>>  
>> 
>> I was able to make some progress on this issue, it looks like this commit:
>> 
>> b1bbfe72ec1ebf302d97f886cc646466c0abd679 aio / timers: On timer 
>> modification, qemu_notify or aio_notify
>> 
>> causes the degradation, I'm attaching the patch that reverts changes in this 
>> commit. Although emulator is
>> performing better with this patch, it's still not as good as it was with 
>> qemu 1.6. Also, this patch
>> is a dirty hack of course, it reverts generic code that works fine on linux 
>> and mac os x, but the problem is on windows
>> only.
>> 
>> Any comments are welcome...
>> 
>> On 03/12/2014 02:59 PM, Stanislav Vorobiov wrote:
>>> Hi all,
>>> 
>>> Just for information, Intel VTune Amplifier XE for windows works great with 
>>> MinGW, it's capable of gathering
>>> correct profiles and symbol naming is ok, you don't even need to build qemu 
>>> with some special options.
>>> 
>>> I'm using it now to find the cause of this performance degradation, m.b. 
>>> someone else will find it useful as well.
>>> 
>>> Thanks.
>>> 
>>> On 01/16/2014 06:38 AM, 황석연wrote:
>>>> Dear all,
>>>>
>>>>  
>>>>
>>>> @ stanislav
>>>>
>>>> You are right. The performance profiling in Windows is very hard job.
>>>>
>>>> Actually I prefer using profiling tool to analysing sources, trial and 
>>>> error, in Windows - MinGW.
>>>>
>>>>  
>>>>
>>>> @ all
>>>>
>>>> If anyone knows good profiling tool in Windows - MinGW,
>>>>
>>>> Please let us know.
>>>>
>>>>  
>>>>
>>>> Thanks.
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>> ------- *Original Message* -------
>>>>
>>>> *Sender* : Stanislav VorobiovLeading Engineer/SRR-Mobile S/W Group/삼성전자
>>>>
>>>> *Date* : 2014-01-15 14:54 (GMT+09:00)
>>>>
>>>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance
>>>>
>>>>  
>>>>
>>>> Hi, Syeon
>>>>
>>>> Yes, but unfortunately it's hard to say where exactly is that problem. It 
>>>> would be great to do some profiling, but on MinGW it seems
>>>> not an easy task. In MinGW there're no things such as valgrind or perf and 
>>>> all existing windows profiling tools require .pdb database,
>>>> which means they can only profile executables built by visual studio. 
>>>> After some struggling I've managed to run qemu with gprof, which
>>>> gave me output with correct symbol naming, but unfortunately the output is 
>>>> still not usefull, m.b. it's because gprof is known to not
>>>> work correctly with multithreaded applications. Do you have suggestions 
>>>> how can we profile qemu on windows ? Are there any good tools
>>>> you know about ?
>>>>
>>>> On 01/15/2014 08:35 AM, SeokYeon Hwang wrote:
>>>>> Dear all,
>>>>>
>>>>>  
>>>>>
>>>>> I can reproduce performance degradation on Windows.
>>>>>
>>>>> We should figure out why.
>>>>>
>>>>> I thinks it could be related with timer logic changes on 1.7.0.
>>>>>
>>>>>  
>>>>>
>>>>> Thanks.
>>>>>
>>>>>  
>>>>>
>>>>> ------- *Original Message* -------
>>>>>
>>>>> *Sender* : Stanislav VorobiovLeading Engineer/SRR-Mobile S/W Group/삼성전자
>>>>>
>>>>> *Date* : 2014-01-13 14:52 (GMT+09:00)
>>>>>
>>>>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance
>>>>>
>>>>>  
>>>>>
>>>>> Hi, Syeon
>>>>>
>>>>> It's not necessarily related to HAXM, the thing is slowdown is 
>>>>> significant, e.g. home screen renders about
>>>>> 5 times longer than before, home screen scrolling is like 2-3 fps. Other 
>>>>> graphics apps are also slow.
>>>>>
>>>>> On 01/13/2014 06:19 AM, 황석연wrote:
>>>>>> Hi, stanislav,
>>>>>>
>>>>>>  
>>>>>>
>>>>>> According to my memory, there is no significant changes related with 
>>>>>> HAXM.
>>>>>>
>>>>>> But I will re-check about it.
>>>>>>
>>>>>>  
>>>>>>
>>>>>> ------- *Original Message* -------
>>>>>>
>>>>>> *Sender* : Stanislav VorobiovLeading Engineer/SRR-Mobile S/W Group/삼성전자
>>>>>>
>>>>>> *Date* : 2014-01-10 22:23 (GMT+09:00)
>>>>>>
>>>>>> *Title* : Re: [Dev] [SDK/Emulator] Tizen emulator on windows performance
>>>>>>
>>>>>>  
>>>>>>
>>>>>> Also, this happens both with maru VGA and VIGS
>>>>>>
>>>>>> On 01/10/2014 01:06 PM, Stanislav Vorobiov wrote:
>>>>>>> Hi, all
>>>>>>>
>>>>>>> After updating tizen branch today (with 1.7.0 merge) I've noticed 
>>>>>>> performance degradation on windows 7 64-bit with HAXM-enabled,
>>>>>>> is this some known issue ? Were there significant changes to HAXM in 
>>>>>>> 1.7.0 merge ?
>>>>>>>
>>>>>>> On 01/08/2014 07:59 AM, 황석연wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> A QEMU 1.7.0 stable version has been merged into tizen branch.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> ------- *Original Message* -------
>>>>>>>>
>>>>>>>> *Sender* : 황석연책임/VM파트/에스코어
>>>>>>>>
>>>>>>>> *Date* : 2014-01-03 13:16 (GMT+09:00)
>>>>>>>>
>>>>>>>> *Title* : [Dev] [SDK/Emulator] Merge qemu stable-1.7.0 on tizen 
>>>>>>>> emulator
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> We has been tested "Tizen Emulator" with tizen_qemu_1.7 branch, and it 
>>>>>>>> works well.
>>>>>>>>
>>>>>>>> So we planned to merge it to tizen branch on next Tuesday - 7, Jan.
>>>>>>>>
>>>>>>>> If you have any opinion, please let me know.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> *And please subscribe "Dev" mailing list on "tizen.org".*
>>>>>>>>
>>>>>>>> *https://lists.tizen.org/listinfo/dev*
>>>>>>>>
>>>>>>>> *I don't add any other recipients after this mail.*
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> @ John,
>>>>>>>>
>>>>>>>> Please forward this mail to IVI maintainer.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>  
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Dev mailing list
>>>>>>>> [email protected] <mailto:[email protected]>
>>>>>>>> https://lists.tizen.org/listinfo/dev
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dev mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> https://lists.tizen.org/listinfo/dev
>>>>>>
>>>>>>  
>>>>>>
>>>>>>  
>>>>>>
>>>>>>  
>>>>>>
>>>>>
>>>>>  
>>>>>
>>>>>  
>>>>>
>>>>>  
>>>>>
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>>>  
>>>>
>>> 
>> 
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
>>  
>> 
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 박상호 올림
> 
>  
> 
> Sangho Park (Ph.D)
> 
> Principal Engineer,
> 
> Core Part, OS Lab,
> 
> S-Core
> 
> Tel) +82-70-7125-5039
> 
> Mobile) +82-10-2546-9871
> 
> E-mail) [email protected] <mailto:[email protected]>
> 
>  
> 

_______________________________________________
Dev mailing list
[email protected]
https://lists.tizen.org/listinfo/dev

Reply via email to