Hi all, I've done a couple of wrappers for the NCBI BLAST+ tool blastdbcmd. The NCBI BLAST+ tool blastdbcmd replaces the NCBI legacy BLAST tool fastacmd.
The wrapper first lets you get a FASTA file of sequences from a database by their ID (which works best if your database was built with -parse_seqids), while the second just shows a information about a database like number of sequences and total length (human readable text). Branch here: https://bitbucket.org/peterjc/galaxy-central/src/blastdbcmd Two files: tools/ncbi_blast_plus/ncbi_blastdbcmd_info.xml tools/ncbi_blast_plus/ncbi_blastdbcmd_wrapper.xml Is anyone interested in helping to test these before I ask the Galaxy team to merge them into the trunk? Those of you familiar with the command line tool's options will know you can use -entry all to get all the sequences in the database. This is fine for a small database (e.g. a single genome), but would be a really bad idea for something like the NCBI NR database. Currently there is no safety check for this (but it could be done with a wrapper script that asks via the -info switch how many sequences there are). Do you think some defensive code is a good idea here, e.g. a limit of 5000 sequences when "all" is used? Thanks, Peter ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/