Frode Tennebø
Sat, 17 May 2008 03:57:40 -0700
Yep, I think I'm on approximately 2000 cycles for every divide.
The 16x16 multiply uses no tables whatsoever and costs between 700 and 1200 cycles depending on RAM timings.
From memroy I believe I once had a 16-bits divide which, when unrolled, took slightly more than 1000 T-states and a 16-bits multiply which took around 500 T-states.
You could post your routines and we can collectively give it a try optimising them?
-Frode PS: Very nice video BTW! -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/