I have solved this using blast's '-l' (lowercase L) switch.
Every sequence in your blastable database will need to have a gi number
in its fasta header.
1. Build your blastable DB with the '-o' switch. This creates an index
of the gis.
2. Make a textfile filter.txt with all of the GIs you'd like to leave
in. (every gi in your blastable database, minus the organisms you don't
care about)
3. Run blast (or mpiblast) with the extra switches (-l filter.txt).
Notes:
You can use '(mpi)formatdb -F filter.txt -B filter.bin' to make a
smaller faster binary version of the filter. However this is only worth
the time if you will be reusing the filter.bin for several blast runs.
Binary filters would be perfect if you only need a couple of filters
(mouse-only, non-mouse, human-only, non-mouse, etc). Instead I build a
new filter for each run dynamically.
With ncbi's taxon database you can map taxon IDs to gis.
This can be used to construct arbitrary filters, and may prove useful
for others with more exotic needs.
If your sequences don't have GIs, you can make up your own.
The -l filter seems to add about 2 minutes to each of my blast runs
using the complete nt database as my blastable. This may be prohibitive
overhead in some environments. My blasts typically last for days so its
negligible.
Earlier versions of mpiblast 1.4 reloaded the filter for each sequence
in the query file. This was a painful IO hit as well as a memory hog for
some inputs. I've resolved this, but since it seems no one else uses -l
I'm not sure if my changes are correctly folded in. This problem is
definitely resolved in my local copy.
Ian Korf's BLAST book for being the only reference I've ever seen for
-l. I'll warn you its a one line reference in case you're thinking of
picking it up just for this reason. But the book is valuable for lots of
other reasons as well.
There are enough little gotchas that this isn't as trivial as you might
be hoping, but if this seems like something that might solve your
problem I'll be happy to provide more info.
RAYMOND CHAN wrote:
Hi all,
I'm more on the computer than biological side, so excuse my ignorance on
this, but you guys have been a big help in the past and hope someone can
answer my question.
I have mpiblast running well on my system with a web interface that has
the basic Blast template that comes with NCBI's wwwblast. A user of
mine asked me if there was a way I can limit by entrez/organism as seen
on the NCBI version's web interface for Blast near the bottom of the
page:
http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AU
TO_FORMAT=Semiauto&ALIGNMENTS=50&ALIGNMENT_VIEW=Pairwise&CLIENT=web&DATA
BASE=nr&DESCRIPTIONS=100&ENTREZ_QUERY=%28none%29&EXPECT=10&FILTER=L&FORM
AT_OBJECT=Alignment&FORMAT_TYPE=HTML&GENETIC_CODE=1&NCBI_GI=on&PAGE=Tran
slations&PROGRAM=tblastx&SERVICE=plain&SET_DEFAULTS.x=21&SET_DEFAULTS.y=
9&SHOW_OVERVIEW=on&UNGAPPED_ALIGNMENT=yes&END_OF_HTTPGET=Yes&SHOW_LINKOU
T=yes
Does mpiblast have a flag or some kind of option for me to do this? Do
databases have to be formatted in a specific way to enable this? Or
does NCBI just BLAST against everything, and then based on your limiting
run some kind of script or finalizing function to parse out the
appropriate stuff you want? This would be the easy way, but efficiency
and speed would be the problem here because of the wasted computation
cycles on stuff the user doesn't need. I'm guessing this is probably
not their approach. Any ideas or advice would be helpful in
implementing this option, no matter how tedious or difficult.
Thank you very much,
Ray Chan
Plant Sciences BioComputing Center
Univ. of Calif, Davis
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users