On 07/10/10 01:03, Pádraig Brady wrote: > On 06/10/10 21:41, Assaf Gordon wrote: >> Hello, >> >> I'd like to (re)suggest a feature for the join program - the ability to >> automatically build an output format line (similar but easier than using >> "-o"). >> >> I've previously mentioned it here (but got no favorable responses): >> http://lists.gnu.org/archive/html/bug-coreutils/2009-11/msg00151.html >> >> Several people have been using this option for a year now (on our local >> servers), so I thought I might try to suggest it again. >> >> The full patch is attached, and also available here: >> http://cancan.cshl.edu/labmembers/gordon/files/join_auto_format_2010_10_06.patch >> >> Here's the common use case: >> >> Given two tabular files, with a common key at first column, and many numeric >> (or other) values on other columns, the user wants to join them together >> easily. >> One requirement is that empty/missing values should be populated with "00". >> >> File 1 >> ====== >> bar 10 13 15 16 11 32 >> foo 10 10 11 12 13 14 >> >> >> File 2 >> ====== >> bar 99 91 90 93 91 93 >> baz 90 91 99 96 97 95 >> >> >> Desired joined output >> ============== >> bar 10 13 15 16 11 32 99 91 90 93 91 93 >> baz 00 00 00 00 00 00 90 91 99 96 97 95 >> foo 10 10 11 12 13 14 00 00 00 00 00 00 >> >> There is no technical problem in achieving this, the parameters would be: >> "-a1 -a2 -e 00 -o 0,1.2,1.3,1.4,1.5,1.6,1.7,2.2,2.3,2.4,2.5,2.6,2.7" >> >> But building the "-o" parameter is cumbersome, and error-prone (imaging >> files with dozens of columns, which is very common in my case). >> >> The "--auto-format" feature simply builds the "-o" format line >> automatically, based on the number of columns from both input files. > > Thanks for persisting with this and presenting a concise example. > I agree that this is useful and can't think of a simple workaround. > Perhaps the interface would be better as: > > -o {all (default), padded, FORMAT} > > where padded is the functionality you're suggesting?
Thinking more about it, we mightn't need any new options at all. Currently -e is redundant if -o is not specified. So how about changing that so that if -e is specified we operate as above by auto inserting empty fields? Also I wouldn't base on the number of fields in the first line, instead auto padding to the biggest number of fields on the current lines under consideration. cheers, Pádraig.
