On Sun, May 15, 2011 at 12:16 AM, Aurelien Jarno <aurel...@aurel32.net> wrote:
> On Sat, May 14, 2011 at 10:35:20PM +0300, Blue Swirl wrote:
>> Here's a RFC series for eliminating AREG0.
>>
>> Blue Swirl (11):
>>   Move user emulator stuff from cpu-exec.c to user-exec.c
>>   Delete unused tb_invalidate_page_range
>>
>> The above should be OK to commit.
>>
>>   cpu_loop_exit: avoid using AREG0
>>   Delegate setup of TCG temporaries to targets
>>
>> These two are not, unless the overall plan is OK.
>>
>>   TCG: fix negative frame offset calculations
>>   TCG/x86: use stack for TCG temps
>>   TCG/Sparc64: use stack for TCG temps
>>
>> But these three should be OK. I've tested lightly x86_64 and Sparc64 hosts.
>>
>>   Add CONFIG_TARGET_NEEDS_AREG0
>>   Don't compile legacy qemu_ld/st functions if target doesn't need them
>>
>> Should be OK, though the latter patch only touches x86.
>>
>>   Add new qemu_ld and qemu_st functions
>>   sparc: use new qemu_ld and qemu_st functions
>>
>> The last two compile but QEMU segfaults. I just made a naive
>> conversion for getting comments.
>>
>
> What is the goal behing removing TCG_AREG0? If it is speed improvement,
> can you please provide some benchmarks?

There was some discussion earlier about why this (or parts of the
conversion) may be a speed improvement:

http://article.gmane.org/gmane.comp.emulators.qemu/101826
http://article.gmane.org/gmane.comp.emulators.qemu/102156

There are no benchmarks yet. In fact, it may be difficult to make
those without performing the removal completely.

For example, patch 3/11 makes cpu_loop_exit take CPUState as a
parameter instead of using global env, which would be available and
the register is reserved anyway. This would only decrease performance
at this stage, unless a complete conversion is done. I suspect the
same would happen when moving all helpers from op_helper.c to
helper.c. But after the whole conversion, this would be a neutral (no
extra registers used) or even beneficial change (the code is free to
use one more register).

> The env register is used very often (basically for every load/store, but
> also a lot of helpers), so it makes sense to reserve a register for it.
>
> For what I understand from your patch series, you prefer to pass this
> register explicitly to TCG functions. This basically means this TCG
> global will be loaded to host register as soon as it is used, but also
> regularly, as globals are saved back to their canonical location before
> an helper or a load/store.
>
> So it seems that this patch series will just allowing the "env register"
> to change over time, though it will not spare one more register for the
> TCG code, and it will emit longer TCG code to regularly reload the env
> global into a host register.

But there will be one more register available in some cases. In other
cases, the number of registers used does not change. Moving the
registers around is what worries me too.

But there are other effects too, the helpers are now compiled so that
the global env register is not used. Especially on hosts with low
number of registers this is not optimal.

> In any case at then end benchmarks is what are need to decided, TCG has
> always shown that performance improvements doesn't match the improvement
> analysis.

If this turns out to be a bad idea, it means that the reverse
conversion will be beneficial and we should convert helper.c code to
op_helper.c and take advantage of the global env in a register. This
was actually my standpoint when the discussion started, but I'm
interested to see if this approach would work better.

There are other opportunities as well, for example while refactoring,
qemu_ld/st would be changed. Maybe your constant patches would be
useful there or they could benefit from a rearranged interface.

Reply via email to