On Tue, Jan 6, 2009 at 12:26 PM, Pádraig Brady <p...@draigbrady.com> wrote: > Vitali Lovich wrote: >> On Tue, Jan 6, 2009 at 10:19 AM, Pádraig Brady <p...@draigbrady.com> wrote: >>> I like the idea. >>> >>> So it doesn't support sorting these correctly for example: >>> >>> 999MB >>> 998MiB >>> 1GiB >>> 1030MiB >>> >>> I.E. a mixture of ^2 and ^10 are not supported, >>> nor overlapping number ranges. > > I'm not complaining about the above. Just clarifying. > >>> + /* FIXME: maybe add option to check for longer suffixes (i.e. gigabyte) >>> */ >>> >>> You should allow at least G, GiB and GB formats. >>> Probably should print error if more than one of those >>> formats used, since that's not supported. >> >> I dunno if you read my previous post, but I presented the reasoning >> that if the user has some kind of longer format, it's better handled >> by piping the input through a sed script first. Can you present a >> situation where it would be better for sort itself to try and parse >> longer suffixes? >> >> On a side note, the XiB format (MiB, GiB) is extremely uncommon in my >> opinion. > > It's debatable, but I think we should support the XiB and XB formats > as I've seen them quite often, and certain coreutils like dd for example > take this format as a size specifier. Also look at human_readable() from > gnulib. Perhaps - but for sort, at least from my thinking of how I would implement this, the additional logic (at least to behave correctly on all inputs) would be somewhat complicated. Can you please explain why you believe this belongs in sort and wouldn't be better served by pre-processing the text before sort & post-processing it after as necessary?
Supporting all the various ways the human_readable can be output is just not practical or even useful since the user would have to refer to the manpage every time to figure out what switches to enable to configure the proper behaviour. Also, compare the amount of code that human_readable is to convert from a number into a string (a much easier problem) vs how much additional code there is to add this one feature. Sure dd may take that as input, but they're in a different situation - they actually need to understand what number the user is actually representing. We don't need that extra logic since sort doesn't really need to know - it can work without converting it from a string. I'm not saying you're incorrect - I'm just asking you justify it by providing a use-case where the alternative to not providing the logic within sort would result in a complicated shell-script workaround for the end-user. > Alternatively you could allow any string starting with [KMGT..] > to allow things like KB/s KiBuckets, but then it would be > tricky to flag mixtures of KiB and KiBuckets as an error for example. That's definitely not an acceptable solution because the behavior would be incorrect if you had something like 2Klingongs. >>> + /* FIXME: a_order - b_order || raw_comparison can be used - would that >>> + be faster? */ >>> >>> Yep if you're not supporting overlapping number ranges then >>> you can skip the number comparison totally if the suffixes don't match. >> Actually it has nothing to do that. I'm was just thinking that the >> equality operation I'm testing for is already essentially doing a >> subtraction and then I'm returning the actual subtraction itself. > > Oh right. > Anyway the optimization I mentioned would probably be useful. Debatable. You'd still have to scan the string to find the end of the number to find the suffix. And if you get a miss (i.e. same suffix-level), then you'll have to scan the strings again, performing the comparison. So it's not even obvious that there would be an advantage when the suffixes differ (it might be faster, but I don't think it can possibly be more than a 2-3% difference since you're just skipping the comparison of two characters that are presumably already in registers, or at least the cache) and there's definitely a hit (about 2x slower) when they don't. Vitali
_______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils