On 02/12/2012 02:36 AM, [email protected] wrote: > Hi Coreutils, > > I posted to the list about a month ago and haven't gotten any > response. Perhaps it got overlooked because of the holidays (or > post-holiday email glut), so I'm reposting it: > > --- > > Hello, > > I fairly recently discovered the joys of join, but now I wonder why it > is limited to two files? > > In other words, I would like to do the following: > > join file1 file2 ... fileN > > While I CAN achieve this through other methods, they are not ideal. > For instance, paste works with multiple files, but then I must cut out > the repeated key columns. The following also works, but doesn't > generalize to filename expansions (e.g. `join file*`): > > join file1 file2 | join - file3 | ... | join - fileN > > As for my use case, I am working with data files containing the > results of multiple systems running over the same test items. I would > like to compare the results of all systems for each item by putting > them side-by-side. > > I don't know the history of the command, so I am not aware of any > technical or ideological reasons why it shouldn't support more than > two files. Any explanation appreciated!
Sorry I missed your previous mail. Well a join operation is not scalable across many files. You'd have to stream from each file (hence keep file descriptors open), and maintain internal comparison buffers for each file. Hence join is restricted to 2 "tables", which can be combined more generally as you've shown above. Note for many files, and to handle globbing, you might want to use temp files, which would also help with scalability, like: base=$(mktemp) cp base $base for file in file*; do nbase=$(mktemp) join $base $file > $nbase rm $base done cat $nbase rm $nbase cheers, Pádraig.
