> I recall from my benchmarking days that -- depending on compiler -- > there is a small dereferencing penalty for packed matrices (vectors > packed into dereferencing **..* pointers) compared to doing the offset > arithmetic via brute force inline or via a macro. > ...... > I haven't > run the benchmark recently and don't know how large it currently is. It > was never so large that it stopped me from using repacked pointers for > code clarity..
Mostly unscientific, but worth tossing into the mix: Using Intel 10.1 compilers on a fairly recent AMD chip, 100,000 iterations of doing the nested pointers approach is neck-and-neck with index arithmetic on a 10x10 double matrix. For the 100x100 case it takes 1.3 times longer to iterate using the nested pointers. Work in the inner loop "compute kernel" is *= against a constant scalar. Optimization flags on -O3. I've seen similar behavior on recent GNU compilers. I'm happy to provide the test code if anyone's interested. - Rhys
