Hi Uros, Thank you for such a detailed explanation. I'll study it.
Regards, Vladimir 2011/7/20 Uros Bizjak <ubiz...@gmail.com>: > Hello! > >> > ? ? ? ?* a/gcc/gcse.c (alloc_gcse_mem): Added code to run in PRE2. >> >> And this is necessary because...??? >> >> Why not just make it a separate pass in ix86-reorg that uses LCM? Look at >> mode switching for an example. > > I was also expecting that vzeroupper would be inserted in the same way > as I387 mode switching instructions are inserted. To expand on > Steven's suggestion, please see i386.h for OPTIMIZE_MODE_SWITCHING and > following macros. > > At the moment, there are 4 separate entities that handle (four > independent) insertions for mode switching for x87 for each mode of > fistp or frndint instruction. Mode insertions will actually insert > calculations of x87 control word (CW) at optimal points and push this > new CW (together with old CW) to known stack slot to be consumed by > fistp/frndint insn. > > You can add a new entitiy to enum ix86_entity (say, AVX_VZEROUPPER) > and update OPTIMIZE_MODE_SWITCHING to perform mode insertion for > AVX_VZEROUPPER entitiy when needed. Various modes for AVX_VZEROUPPER > are defined in NUM_MODES_FOR_MODE_SWITCHING, mode transition in > MODE_NEEDED and insn insertions in EMIT_MODE_SET. > > Please note that LCM handles all entities in parallel, so there is no > need for extra passes. The real worker for mode switching is > ix86_mode_needed, but don't forget that you can disable mode switching > pass per-function when not needed through OPTIMIZE_MODE_SWITCHING > macro. > > FYI: Existing x87 CW initialization insertion works this way: > - fistp/frndint is inserted into insn stream and corresponding > OPTIMIZE_MODE_SWITCHING flag is set. > - inserted insn has i386_cw attribute that defines requested mode in > which the insn operate. Based on this attribute, MODE_NEEDED handles > mode transitions (please note that there are four independent > entities) for each entitiy. > - EMIT_MODE_SET emits CW initializations. These are further optimized > by follow-up optimization passes, so two consecutive initializations > at the same place are CSEd, etc. > > Uros. >