I've read the proposed patches that have been batted around on the mailing list (after coming up with my own implementation :D of course). My proposed solution is less generic, but I believe more robust, than the other approaches.
I've proposed my reasoning below, but I've posted it as a bug on launchpad to track this issue 313152 <https://bugs.launchpad.net/bugs/313152>. The patch is against 6.10 instead of trunk mainly because I was too lazy to get the build-system set up on Ubuntu. That being said, I'm pretty sure the patch should still work against the trunk. In any case, if it's necessary, I could also do the diff against the trunk. Code review? What would I need to do to get this mainlined (aside from adding the documentation changes)? REASONING: One of my major assumption is that all the numbers are well formatted. In other words, there's an explicit demarcation in the number line (at least internal to the input being sorted) after which the suffix increases and the number starts again near 0. For instance, if M represents 1050 Kilobytes, then there's no 1051K - it's represented as 1.001M or something along those lines. Again, this would only rely on the input being internally consistent - sort needs no knowledge or hints of what those suffixes represent. Also, there can be no exponential numbers when in this mode mainly because it's unclear whether an `E' represents the beginning of the exponent or an exabyte. Since both would be uncommon as use cases. Exabytes are really really big right now, and exponents would be meaningless since they could only be used for extremely small numbers or numbers that are bigger than a Y suffix. However, from a consistent behaviour and a flexibility standpoint (suffixes can be extended much easier in a consistent manner without worrying about precision), exponents lose out. Also, at the end of the day, the common use case is the du & df utilities (at least, those are the only ones that I consistently see this come up as an issue for presumably because ls has its own internal sort). The suffix is case insensitive - `k' is equivalent to `K'. There's arguments that can be made either way, and I could be easily persuaded on this issue (maybe even add a flag to determine behaviour in this case). The advantage this has is that the code is far simpler, faster, and more accurate. It's simpler because there's no need to worry about what the suffix actually represents (power of 10, power of 2). It's faster because there's no expensive conversion to a double as with the other proposed solutions I've seen. It's more accurate because it uses the numeric string comparison rather than converting to a numerical form which could have precision & overflow issues. _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
