Hi all,

I'm having a little trouble understanding the best way to perform some
tabular file manipulations in Galaxy. I have several tabular files,
which contain different numbers of columns, which I want to combine
using a single column containing an identifier (which must match for
the rows to be combined).


File 1 contains,
c1 = ID
c2 = Score1a

File 2 contains,
c1 = ID
c2 = Score2a
c3 = Score2b
c4 = Score2c

File 3 contains,
c1 = ID
c2 = Score3a
c3 = Score3b

Desired combined file containing:

c1 = ID
c2 = Score1a
c3 = Score2a
c4 = Score2b
c4 = Score2c
c6 = Score3a
c7 = Score3b

I have worked out how to do this with two calls to the "Join two
Datasets" tool, but this results in the repetition of the join column
(ID in this example), so a final clean-up is required using the "Cut"
tool (which breaks the column assignments).

The more flexible "Column Join" tool would let me combine an arbitrary
number of files, but is designed for input files containing the same
column structure.

Is there a better way to do this with Galaxy as it stands?

Alternatively, would adding an option to the "Join two Datasets" tool
not to bother with the redundant column be widely useful?

