On Sat, Apr 21, 2012 at 05:45:35PM -0700, Jonathan M Davis wrote: > On Saturday, April 21, 2012 20:36:18 bearophile wrote: > > Jonathan M Davis: > > > I'm not sure if it's an order of magnitude worse than your > > > solution though, since it's been a while since I studied Big(O) > > > notation (doing the conversion only once is still more expensive > > > than not converting, but I'm not sure how much more expensive - it > > > might cost less than sort such that it actually doesn't matter as > > > for as Big(O) goes though). > > > > Performing the toLower every time the cmp function is called doesn't change > > the O complexity. > > Yes it does. It adds a loop to each comparison (two loops actually, > but since they're not nested, Big(O) only cares about the one), since > toLower has to loop over all of the characters. As sort loops over > each of the strings to compare them for moving them into a sorted > position or not, it calls toLower, which adds a nested loop, so it > increases the Big(O) complexity. Something like this > > foreach(str; strings) > str < otherStr; > > becomes > > foreach(str; strings) > { > foreach(dchar c; str) > //lower characters > > foreach(dchar c; otherStr) > //lower characters > > str < otherStr; > } > > though that's obviously very pseudo-code-ish and not exactly what sort > does. Regardless, those extra loops when the comparison happens are > nested and therefore increase the Big(O) complexity of the overall > algorithm. [...]
Actually, I don't think the nested loops affect Big-O complexity at all. The O(l) complexity (where l = string length) is already inherent in the string comparison "str < otherStr". Adding more loops over the strings doesn't change the Big-O complexity (you just make O(l) into 3*O(l) which is the same as O(l)). However, always remember that Big-O notation hides constant factors and terms. These factors are negligible given a sufficiently large problem size, but for small problem sizes, the hidden constants are very much significant. An O(n log n) algorithm may actually take n*log(10*n) steps or 1000*n*log(n) steps; for large enough n, they're approximately the same, but when n is small, the 1000 has a very significant hit on observed performance. That's why optimizing the constant factors is sometimes necessary (provided you've already minimized the big-O complexity to the full, since otherwise you're just pinching pennies yet freely spending $100 bills). Inner loop optimization, like strength reduction, loop invariant hoisting, etc., are examples of where constant factors get optimized. If you have a loop: real x; for (i=0; i < n; i++) { // bigComplexCalculation is independent of i real n = bigComplexCalculation(x); // make use of n in some way } Then moving the bigComplexCalculation() call outside the loop improves the observed performance significantly, even though you're not changing the big-O complexity: a small change from 10*n to 8*n means a 20% improvement in observed performance, even though the algorithm still degrades linearly with problem size just like before. T -- Microsoft is to operating systems & security ... what McDonalds is to gourmet cooking.