I have solved this using blast's '-l' (lowercase L) switch.


Every sequence in your blastable database will need to have a gi number in its fasta header.

1. Build your blastable DB with the '-o' switch. This creates an index of the gis.

2. Make a textfile filter.txt with all of the GIs you'd like to leave in. (every gi in your blastable database, minus the organisms you don't care about)

3. Run blast (or mpiblast) with the extra switches (-l filter.txt).


Notes:

You can use '(mpi)formatdb -F filter.txt -B filter.bin' to make a smaller faster binary version of the filter. However this is only worth the time if you will be reusing the filter.bin for several blast runs.

Binary filters would be perfect if you only need a couple of filters (mouse-only, non-mouse, human-only, non-mouse, etc). Instead I build a new filter for each run dynamically.


With ncbi's taxon database you can map taxon IDs to gis.

This can be used to construct arbitrary filters, and may prove useful for others with more exotic needs.

If your sequences don't have GIs, you can make up your own.

The -l filter seems to add about 2 minutes to each of my blast runs using the complete nt database as my blastable. This may be prohibitive overhead in some environments. My blasts typically last for days so its negligible.

Earlier versions of mpiblast 1.4 reloaded the filter for each sequence in the query file. This was a painful IO hit as well as a memory hog for some inputs. I've resolved this, but since it seems no one else uses -l I'm not sure if my changes are correctly folded in. This problem is definitely resolved in my local copy.


Ian Korf's BLAST book for being the only reference I've ever seen for -l. I'll warn you its a one line reference in case you're thinking of picking it up just for this reason. But the book is valuable for lots of other reasons as well.




There are enough little gotchas that this isn't as trivial as you might be hoping, but if this seems like something that might solve your problem I'll be happy to provide more info.







RAYMOND CHAN wrote:
Hi all,

I'm more on the computer than biological side, so excuse my ignorance on
this, but you guys have been a big help in the past and hope someone can
answer my question.

I have mpiblast running well on my system with a web interface that has
the basic Blast template that comes with NCBI's wwwblast.  A user of
mine asked me if there was a way I can limit by entrez/organism as seen
on the NCBI version's web interface for Blast near the bottom of the
page:

http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AU
TO_FORMAT=Semiauto&ALIGNMENTS=50&ALIGNMENT_VIEW=Pairwise&CLIENT=web&DATA
BASE=nr&DESCRIPTIONS=100&ENTREZ_QUERY=%28none%29&EXPECT=10&FILTER=L&FORM
AT_OBJECT=Alignment&FORMAT_TYPE=HTML&GENETIC_CODE=1&NCBI_GI=on&PAGE=Tran
slations&PROGRAM=tblastx&SERVICE=plain&SET_DEFAULTS.x=21&SET_DEFAULTS.y=
9&SHOW_OVERVIEW=on&UNGAPPED_ALIGNMENT=yes&END_OF_HTTPGET=Yes&SHOW_LINKOU
T=yes

Does mpiblast have a flag or some kind of option for me to do this?  Do
databases have to be formatted in a specific way to enable this?  Or
does NCBI just BLAST against everything, and then based on your limiting
run some kind of script or finalizing function to parse out the
appropriate stuff you want?  This would be the easy way, but efficiency
and speed would be the problem here because of the wasted computation
cycles on stuff the user doesn't need.  I'm guessing this is probably
not their approach.  Any ideas or advice would be helpful in
implementing this option, no matter how tedious or difficult.

Thank you very much,
Ray Chan
Plant Sciences BioComputing Center
Univ. of Calif, Davis



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users




-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Reply via email to