Hi all,

As I mentioned on Twitter, at the end of last week I wrapped Blast2GO
for Galaxy, using the b2g4pipe program (Blast2GO for pipelines). See

Currently current code is on bitbucket under my tools branch,

Specifically files tools/ncbi_blast_plus/blast2go.* viewable here:

I've using a Galaxy location file, tool-data/blast2go.loc, to offer
one or more Blast2GO configurations (properties files), mapping this
to the -prop argument. This way you could have for example the Spanish
Blast2GO server with its current database (May 2010), and a local
Blast2GO database. I want to setup a local database and try this
before submitting the wrapper to the Tool Shed.

The input to the tool is a BLAST XML file, specifically blasting
against a protein database like NR (so blastp or blastx, not blastn
etc). I want to try some very large BLAST XML files to confirm
b2g4pipe copes with the current BLAST+ output files - I gather there
were some problems in this area in the past, so having the wrapper
script fragment the XML might be a workaround. Currently the only real
function of the wrapper script is to rename the output file - b2g4pipe
insists on using the *.annot extension.

Right now the only output is a tabular three column *.annot file,
which can be loaded into the Blast2GO GUI. For analysis within Galaxy,
I'm wondering about an option to split the first column (which holds
the original FASTA query's identifier and any description) in two.
i.e. Split at the first white space to give the FASTA identifier, and
any optional description as a separate column. That would make
linking/joining/filtering on the ID much easier.

If anyone has any comments or feedback now, that would be welcome.
Yesterday Alex Bossers indicated on Twitter that Gerrit had also been
looking at this (CC'd).


Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:


Reply via email to