On Sun, May 15, 2011 at 5:03 PM, Aurelien Jarno <aurel...@aurel32.net> wrote: > On Sun, May 15, 2011 at 04:42:05PM +0300, Blue Swirl wrote: >> On Sun, May 15, 2011 at 4:02 PM, Aurelien Jarno <aurel...@aurel32.net> wrote: >> > On Sun, May 15, 2011 at 03:37:00PM +0300, Blue Swirl wrote: >> >> On Sun, May 15, 2011 at 3:14 PM, Laurent Desnogues >> >> <laurent.desnog...@gmail.com> wrote: >> >> > On Sun, May 15, 2011 at 1:33 PM, Blue Swirl <blauwir...@gmail.com> >> >> > wrote: >> >> > [...] >> >> >>> x86_64 uses r14 as TCG_AREG0. Despite the instructions being quite >> >> >>> simple (only 2 movi_i32), the resulting code makes 2 access to env to >> >> >>> save the two registers. Having to reload the env pointer each time to >> >> >>> a >> >> >>> register would clearly increase the size of this TB. >> >> >> >> >> >> I don't think TCG would be that simple, instead the pointer would be >> >> >> loaded only once in this case. >> >> > >> >> > Assuming TCG was able to allocate a register for that, >> >> > it would be live at most for one TB, so you'd have to >> >> > load it at least once per TB, and with block chaining >> >> > that wouldn't be efficient as you'd keep on reloading it. >> >> >> >> Yes, but if there are better uses, the register can be flushed. Now >> >> this is not possible since the register is always unavailable. >> >> >> > >> > What are the better uses, that justify to flush a register that is going >> > to be used three or four host asm later? >> >> It would obviously replace something else determined by TCG. > > The register will be free only for a few host instructions. Could you > please give more concrete example about such a usage? > >> > In the current generated code, roughly one every four instruction >> > reference TCG_AREG0, so this register is really needed very often. >> > >> > If you think TCG will be faster by having one more register in between >> > I suggest you to first optimize tcg_reg_alloc(), which simply spill >> > a random register, even if they are some allocated register that won't >> > be used until the end of the TB. You should also should check how often >> > TCG spills a register (in which case it would have benefit from one more >> > register). It happens less than 2000 times when booting an emulated mips >> > system on x86_64, while more than 160000 TB are generated. >> >> Right, on a modern CPU with lots of registers, one additional register >> won't be helpful, but on i386 the situation should be very different, >> there are very few registers. >> > > On i386, I indeed get a lot more of spilled registers, that is 340000. Still > that number is not that high, it's less than two times per TB. If we > consider that these register spills are pure loss (which is not always > the case, sometime the spilled register is actually never used later, so > it's just an anticipated save), that's 4 load/store per TB. > > It means to compensate, the env register should not be loaded more than > 4 times in a TB, which looks like quite difficult to achieve given how > often this register is used. > > Please also note that spilling globals currently need access to the env > pointer, which might not be loaded, so another register spill is need to > load it. This will make the code a lot more complex than now to avoid a > deadlock (probably by spilling local temps first).
OK, this doesn't look so attractive after all.