On 03/16/2011 02:25 AM, Peter Turczak wrote: > Hi Magnus, hi Rob, > > a while ago I made the same observations you did. On an m68k-nommu > with 166 MHz the RSA exchange took quite forever. After some > profiling I found out the comba multiply routine in libtommath was > eating most of the time. It seems gcc produces quite inefficient code > there. Libtommath resizes its large integers while calculating > leading to more work for user memory management.
User mememory management? It's got a malloc/free in an inner loop? BARF! (Yeah, that'll blow your L1 cache wide open and slow stuff down by at least an order of magnitude. Allocation functions are some of the most cache unfriendly things you can do, pretty much by definition. Unused memory is not cache hot, pretty much by definition. That's sort of the point. Copying the data sucks too, but it's doing the copying on all platforms I'd guess...) Rob
