I've noticed that the Virtex 6 BRAM clock-to-output time is kind of long when NOT using the optional output register (Trcko_DO = 2.08 ns) as compared to using the optional output register (Trcko_DOA_REG = 0.75 ns). Using the optional output register adds an extra cycle of latency, but that extra 1.33 ns could be worth it especially since the optional output register uses (or at least sounds like it uses) dedicated flip-flops in the BRAM rather than CLB flip-flops.
Setting a Xilinx "Single Port RAM" block's latency to 2 will enable use of the optional output register. Unfortunately, the CASPER "Shared BRAM" block does not have a latency setting (defaults to 1?) so the tools do not end up using the underlying BRAM's optional output register (even if there is a register on the Shared BRAM's output that could, in theory, be absorbed into the underlying BRAM). It would be good to add an optional "Latency" parameter to the Shared BRAM block and allow the user to select "1" (current value that does not use the BRAMs optional output register) or "2" (new value that does use the BRAM's optional output register). I think this would help ROACH2 designs meet timing more easily. I will look at this in more detail to see how involved it would be to add this feature. Once added, I think we'll want to set the default to 2. It would also be good to make sure the PPC side of the shared BRAM also uses the optional output registers. I think we should also be recommending that regular (i.e. non-yellow) BRAM blocks be set to a latency of 2 at a minimum. Maybe we are already? This also seems like an issue for ROACH as well (though the smaller chip seems faster to cross so maybe not so important as on ROACH2?). Dave

