Roger McNichols wrote: > > Currently using version 5.2.1 of coreutils 'split' command produces files > with 'intelligent' suffixes. That is, the number of letters (or digits) > required > is based on the known number of output files that will be required.
Actually coreutils does not employ 'intelligent' suffixes, as the size of the input is not taken into account and the suffix length defaults to 2. One could set it 'intelligently' outside of split using something like the following. However this should really be done within split: size=$(du -b "$file" | cut -f1) chunk=4096 suffix_len=$( python -c " import math as m print int(m.ceil(m.log($size/$chunk,26))) " ) split -a$suffix_len "$file" > An OLD version of split (and I dont know which one becuase I dont have it > anymore) > used 'dumb' suffixes. That is, it would start with aa, ab, ac, ..., ba, bb, > bc, ... > util it got to zz and then would jump to zzaa, zzab, zzac, ... etc and then > on > to zzaaaa, zzaaab, zzaaac, etc... I think I've seen this method before but it's not in solaris, freebsd or alexautils? Grr that's bugging me now. Whatever implementation of split that was, it seems like a good way to split arbitrary sized input while file names name sort lexically. Also if the file size _is_ known but a suffix length that's too short is specified, one could use this algorithm to ensure that you don't get the "suffixes exhausted" error. In fact, for consistency it would probably be better to always default to 2 as the suffix len, and fall back to this zzaa suffix scheme rather than "intelligently" select the suffix length as described above. I'll look at doing this soon. thanks, Pádraig.
