And Jiahao Chen writes:
> I tried to manually inline idxmaxabs. It made absolutely no difference
> on my machine. The row scaling takes ~0.05% of total execution time.

Simply inlining, sure, but you could scale inside the outer loop
and find next the pivot in the inner loop.  Making only a single
pass over the data should save more than 0.05% once you leave
cache.  But as long as you're in cache (500x500 is approx. 2MiB),
not much will matter.

Ultimately, I'm not sure who's interested in complete pivoting
for LU.  That choice alone kills performance on modern machines
for negligible benefit.  You likely would find more interest for
column-pivoted QR or rook pivoting in LDL^T.

Reply via email to