Hi all,

I've done a couple of wrappers for the NCBI BLAST+
tool blastdbcmd. The NCBI BLAST+ tool blastdbcmd
replaces the NCBI legacy BLAST tool fastacmd.

The wrapper first lets you get a FASTA file of sequences
from a database by their ID (which works best if your
database was built with -parse_seqids), while the second
just shows a information about a database like number of
sequences and total length (human readable text).

Branch here:
https://bitbucket.org/peterjc/galaxy-central/src/blastdbcmd

Two files:
tools/ncbi_blast_plus/ncbi_blastdbcmd_info.xml
tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml

Is anyone interested in helping to test these before I
ask the Galaxy team to merge them into the trunk?

Those of you familiar with the command line tool's
options will know you can use -entry all to get all the
sequences in the database. This is fine for a small
database (e.g. a single genome), but would be a
really bad idea for something like the NCBI NR
database. Currently there is no safety check for this
(but it could be done with a wrapper script that asks via
the -info switch how many sequences there are). Do
you think some defensive code is a good idea here,
e.g. a limit of 5000 sequences when "all" is used?

Thanks,

Peter
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to