I found a machine with the old version of split.
home:~> uname -a Linux home 2.2.13 #4 Thu May 8 23:11:31 CDT 2003 i686 unknown home:~> home:~> split --version split (GNU textutils) 1.22 home:~> Here's the result of home:~> cat /var/log/messages | split -2 - /tmp/x. not exactly as I recalled. instead of adding zz first time, adds za but ends with yz, then starts adding zz... Anyway: x.aa x.ab x.ac x.ad x.ae x.af x.ag x.ah x.ai x.aj ... x.yv x.yw x.yx x.yy x.yz x.zaaa x.zaab x.zaac x.zaad x.zaae x.zaaf ... x.zyzt x.zyzu x.zyzv x.zyzw x.zyzx x.zyzy x.zyzz x.zzaaaa x.zzaaab x.zzaaac x.zzaaad ... ___________________________ Roger J. McNichols, Ph.D. Chief Scientist BioTex, Inc. 8058 El Rio St. Houston, TX 77054 713.741.0111 (o) 713.741.0122 (f) 832.338.4371 (m) ----- Pádraig Brady <[email protected]> wrote: > Roger McNichols wrote: > > > > Thanks for the feedback. > > > > > >> Do you mean select the appropriate suffix length based on size, > >> or do you mean the zzaa, zzab scheme? The former wouldn't > >> help when processing a pipe for example so I'd probably > >> stick with the latter method for consistency. > > > > Currently, split (at least 5.2.1) DOES pick the suffix size based on the > > file > > size when used as "split -<#> file" and the file size is known. > > I checked the repo and can't see code supporting that. > Perhaps you've got a locally modified `split` ? > > > But as you > > point out, if the file is a pipe you may still run out of suffixes if the > > file size > > changes after invocatio of slpit, or if split is used in the "split -<#> -" > > (reads stdin) mode, a 2-letter suffix is all you get unless you specify a > > length. > > Now I suppose that maybe the discussion went something like: > > >> what if an unknown-sized input stream is the input? > > >> well then just use -a 100 and you will never* run out... > > (*note 26^100 is pretty big) > > > > Anyway, I propose to develop a new commandline option that would invoke the > > 'old' > > suffix formation behavior. And even though aa ... zaa ... zzaa ... instead > > of > > aa .. zzaa ... zzzzaa (as well as many other schemes) would work just as > > well, > > Bzzt. zaa would sort before zb > In general one needs to append 'z'*suffix_len which would default to 2 if not > specified. > One would need to consider this behaviour with digit suffixes also. > > > I propose to utilize the 'old' one for the added advantage of reverse > > compatibility. > > OK. While I like the scheme it would be really nice to see what we're being > compatible > with. I.E. it would be great if you found where the old split you used came > from. > > > That way any code that relied on the old scheme for counting would be able > > to be > > re-functionalized with a simple addition of a commandline argument. > > > >> if the suffix len is specified and is too small. > >> Otherwise we use the zzaa, zzab method as described before. > > > > This is also a good idea, but it might override the users intention which > > could > > be to use split to detect a file that was more that 676*N lines long or to > > use it > > with the -1 option and only write our the first 676 lines of the input > > That's exceedingly unlikely. It would be great to have the "unlimited" > behaviour > by default I think. As mentioned before we could have the "limited" behaviour > if POSIXLY_CORRECT is set. > > > (who knows why, but we're fixing a fix that broke something else, right?) > > I can't see the code for the old behaviour so I wouldn't assume that. > > cheers, > Pádraig.
