On 10/01/2014, at 6:17 , Adam Wick <[email protected]> wrote:

> That’s the problem with SHA, then. The implementation (and the spec, really) 
> is essentially a long combination of the form:
> 
> let x_n5 = small_computation x_n1 x_n2 x_n3 x_n4
>      x_n6 = small_computation x_n2 x_n3 x_n4 x_n5
>      …
> 
> Which has ~70 entries. The actual number of live variables alive at any time 
> should be relatively small, but if slots aren’t getting reused there’s going 
> to be some significant blowup. (To be honest, I had figured — and thought I 
> had validated — that doing it this way would give the compiler the best 
> chance at generating optimal code, but it appears I merely set myself up to 
> hit this limitation several years later.)

If this [1] is the current version then I don't think there is any performance 
reason to manually unroll the loops like that. If you write a standard 
tail-recursive loop then the branch predictor in the processor should make the 
correct prediction for all iterations except the last one. You'll get one 
pipeline stall at the end due to a mis-predicted backward branch, but it won't 
matter in terms of absolute percentage of execution time. You generally only 
need to worry about branches if the branch flag flips between True and False 
frequently. 

If you care deeply about performance then on some processors it can be helpful 
to unfold this sort of code so that the SHA constants are represented as 
literals in the instruction stream instead of in static data memory -- but that 
ability is very processor specific and you'd need to really stare at the 
emitted assembly code to see if it's worthwhile.

Ben.


[1] https://github.com/GaloisInc/SHA/blob/master/Data/Digest/Pure/SHA.hs


_______________________________________________
ghc-devs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/ghc-devs

Reply via email to