Hmmm.

I disassembled my own code, and I'm finding essentially the exact same
inner loops as you.  But the speed of the two approaches differs by a
factor of 5 on my machine.  Could it be something relating to cache
sizes --- maybe my machine is continuously missing the cache on the
dereference and yours is continually hitting it?

I'll post my code tomorrow, but it's quite similar to yours.

rif


Reply via email to