Re: Can this implementation of Damm algorithm be optimized?

Era Scarecrow via Digitalmars-d-learn Thu, 09 Feb 2017 15:52:07 -0800

On Thursday, 9 February 2017 at 19:39:49 UTC, Nestor wrote:

OK I changed the approach using a multidimensional array forthe matrix so I could ditch arithmetic operations altogether,but curiously after measuring a few thousand runs of bothimplementations through avgtime, I see no noticeabledifference. Why?

Truthfully, because you'll need tens of millions or hundreds ofmillions in length to determine if it makes a difference and howmuch. Addition, subtraction and simple memory lookups take verylittle time, and since the entire array (100 bytes) fits in thecache, it is going to perform very very very well regardless ifyou can optimize it further.

If you tested this on a much slower system, say an 8bit 6502 thedifferences would be far more pronounced, but not that muchdifferent.

Since the algorithm is more or less O(n) optimizing it won'tmake many differences.

It's possible you could get a speedup by making them intsinstead of chars, since then it might not have a penalty for the'address not divisible by 4' that applies (which is more for ARMarchitectures and less for x86).

Other optimizations could be to make it multiple levels, takingthe basic 100 elements and expanding them 2-3 levels deep in alookup and having it do it in more or less a single operation.(100 bytes for 1 level, 10,000 for 2 levels, 1,000,000 for 3levels, 100,000,000 for 4 levels, etc), but the steps ofconverting it to the array lookup won't give you that much gain,although fewer memory lookups but none of them will be cached, soany advantage from that is probably lost. Although if you bump upthe size to 16x10 instead of 10x10, you could use a shift insteadof *10 which will make that slightly faster (there will be unusedempty padded spots)

In theory if you avoid the memory lookup at all, you could gainsome amount of speed, depending on how it searches a manualtable, although using a switch-case and a mixin to do all thedetails it feels like it wouldn't give you any gain...

Division operations are the slowest operations you can do, butotherwise most instructions run really fast. Unless you're tryingto get it smaller (fewer bytes for the call) or shaving for speedby instruction cycle counting (like on the 6502) i doubt you'llget much benefit.

Re: Can this implementation of Damm algorithm be optimized?

Reply via email to