Doesn't modern hardware have pretty good branch prediction? In which case the order of the branches may not matter unless it's a long chain of calls? Vs say an inner loop that hasn't been inlined?
Either way, I'd love be stay in the loop on this topic, for work I'm building a strongly normalizing language that supports both strict and call by need evaluation strategies. On Friday, October 23, 2015, Ryan Newton <rrnew...@gmail.com> wrote: > >> 1. Small tweaks: The CMM code above seems to be *betting* than the >> thunk is unevaluated, because it does the stack check and stack write >> *before* the predicate test that checks if the thunk is evaluated (if >> (R1 & 7 != 0) goto c3aO; else goto c3aP;). With a bang-pattern >> function, couldn't it make the opposite bet? That is, branch on whether >> the thunk is evaluated first, and then the wasted computation is only a >> single correctly predicted branch (and a read of a tag that we need to >> read >> anyway). >> >> Oh, a small further addition would be needed for this tweak. In the > generated code above "Sp = Sp + 8;" happens *late*, but I think it could > happen right after the call to the thunk. In general, does it seem > feasible to separate the slowpath from fastpath as in the following tweak > of the example CMM? > > > * // Skip to the chase if it's already evaluated:* > * start:* > * if (R2 & 7 != 0) goto fastpath; else goto slowpath;* > > * slowpath: // Formerly c3aY* > * if ((Sp + -8) < SpLim) goto c3aZ; else goto c3b0;* > * c3aZ:* > * // nop* > * R1 = PicBaseReg + foo_closure;* > * call (I64[BaseReg - 8])(R2, R1) args: 8, res: 0, upd: 8;* > * c3b0:* > * I64[Sp - 8] = PicBaseReg + block_c3aO_info;* > * R1 = R2;* > * Sp = Sp - 8;* > > * call (I64[R1])(R1) returns to fastpath, args: 8, res: 8, upd: 8;* > * // Sp bump moved to here so it's separate from "fastpath"* > * Sp = Sp + 8;* > > * fastpath: // Formerly c3aO* > * if (R1 & 7 >= 2) goto c3aW; else goto c3aX;* > * c3aW:* > * R1 = P64[R1 + 6] & (-8);* > * call (I64[R1])(R1) args: 8, res: 0, upd: 8;* > * c3aX:* > * R1 = PicBaseReg + lvl_r39S_closure;* > * call (I64[R1])(R1) args: 8, res: 0, upd: 8;* > > > >
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs