http://bugzilla.novell.com/show_bug.cgi?id=487846
User [email protected] added comment http://bugzilla.novell.com/show_bug.cgi?id=487846#c31 --- Comment #31 from Steven Munroe <[email protected]> 2009-09-07 21:22:23 MDT --- Some observations: Mono JIT is generating alot of branch +4 (branch to the next instruction following the branch) instructions. This is bad for a number of reasons the primary being that the branch terminates the (super scalar multi-instruction) dispatch group. As this branch is nominally redundant it lowers the CPI an hurts overall performance. However this branch+4 is useful in the load-hit-store case. For higher bandwidth POWER uses a Store Queue between the LSU and the L2 cache. But in the special case of a load of the same data recently storage the load may experience a pipe line data (waiting for the data to reach a bypass stage or exit the STQ into the L2). The worst case is when the load immediately follows the store instruction and the IFU attempts to dispatch both in the same group (as can happen on O-O-O machines like P4/P5/P7). This is detected later and results in an instruction reject/flush/refetch starting with the load (dozens of cycles delay). This hazard is normally rare as POWER has lots of registers (32) and compilers normally allocate locals to registers and avoid the reload. The Mono JIT does not seems to be allocating locals to registers aggressively enough to avoid the reload case (even when many registers are available). I observe that many of the b+4 cases are the existing the if/then/else. And many of those then/else blocks are setting (stores) local variables that are referenced (loaded) in the next basic block. So in this case a simply noping the branch may give worse performance then leaving it in place. So a peep-hole optimization that checks for this case is required. If there is non-LHS case then simply nop the branch. In the LHS case where the store/load use the same register and we can prove the that there is only one entry the block with the load, we can nop both the branch and the load. In the LHS case where the store/load use a different register and we can prove the that there is only one entry the block with the load, we can nop the branch and replace the load with a move register. If we can't prove that there is only one entry into the following block and the store/load use the same register, we can patch the branch to skip the load (branch+8). This requires some new infrastructure to determine if the load target has multiple entries, and avoiding special cases (MONO_PATCH_INFO_METHOD_JUMP) where the branch may be relocated several times (this can generate asserts then the second patch finds a nop instead of a branch). I am working through this and hope to have a patch to review soon. This is helping me to learn about mono-internals leads into the larger issue of optimizing calls and pointer loads. -- Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug. _______________________________________________ mono-bugs maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-bugs
