Hi all,

I'm wondering if the following task can be done in Galaxy with the
standard tools. The specific example is selecting the top (e.g. 3)
match sequences for each blast query, but I see this problem as much
more general than a  "Select top BLAST hits" tool.

I want to select the first few (e.g. 1) rows of each group in a
tabular file, where the group criteria is having certain columns equal
(e.g. the first 2).

e.g. Tabular BLAST output has columns of query ID, match ID, etc.

queryA match1 ...
queryA match2 ...
queryA match2 ...
queryA match3 ...
queryA match4 ...
queryA match4 ...
queryA match4 ...
queryB match5 ...
queryB match5 ...
queryC match6 ...
queryC match7 ...

In this example, some of my queries have more than one HSP per match
(more than one line with the same first two columns). If I group on
the first two columns, the groups are:

------------------------
queryA match1 ...
------------------------
queryA match2 ...
queryA match2 ...
------------------------
queryA match3 ...
------------------------
queryA match4 ...
queryA match4 ...
queryA match4 ...
------------------------
queryB match5 ...
queryB match5 ...
------------------------
queryC match6 ...
------------------------
queryC match7 ...
------------------------

If I then take the first row in each group, that gives me just the
first HSP for each query+match combination.

queryA match1 ...
queryA match2 ...
queryA match3 ...
queryA match4 ...
queryB match5 ...
queryC match6 ...
queryC match7 ...

If for example I wanted only the top 3 matches for each query, I could
repeat the proposed tool one more time but with different settings -
this time grouping on the first column only:

queryA match1 ...
queryA match2 ...
queryA match3 ...
queryB match5 ...
queryC match6 ...
queryC match7 ...

I hope I've conveyed the idea here. The existing tools "Select first
lines from a dataset" and "Select last lines from a dataset" are
related, but do this at the file level.

Does this make sense? Does it seem like a useful tool to write if
there isn't anything like this already present? Or might it be simpler
to just write a "Select top BLAST hits" tool?

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to