Hi Aeron,
Thanks by considerate. Your answer is very clear.
But, yet there are some doubts:
> It is assumed that the rank 0 process has access to the
complete
> database so that it can calculate the correct effective
lengths for the entire DB
- How the rank 0 process knows about to complete database,
if it has been fragmented? Or rank 0 process pass just parameters of length
complete DB?
- What the PATCH does basically on NCBI Toolbox? Conceptually…
> For some workloads, the search space calculation can be rather time-consuming
- About calculate to
effective length. Is it doing to all query sequence? Because this is onerous?
Thank you very much.
Daniel
*****************************************************************
* Daniel Xavier de Sousa *
* Mestrando em Informática - PUC-Rio *
* E-MAIL : dsousaARROBAinf.puc-rio.br *
* Fone : +55 21 35271500 - 4543 *
****************************************************************
----- Mensagem original ----
De: Aaron Darling <[EMAIL PROTECTED]>
Para: [email protected]
Enviadas: Sexta-feira, 23 de Março de 2007 22:16:03
Assunto: Re: [Mpiblast-users] statistics of MPIBlast
G'day Daniel,
At the moment we don't have a peer-reviewed journal publication
describing how mpiBLAST deals with computing the e-value statistics.
The most precise description of how it works is of course the mpiBLAST
code itself and the associated patch to the NCBI Toolbox, although I'll
save you the time and trouble of having to muck around in the code by
summarizing the important bits :)
BLAST e-value statistics represent the expected number of times one
would see a hit with a particular bit-score by chance in a random
database of the same size as the target database. Several assumptions
about models of evolution and other factors go into the score
calculation, many of the model's assumptions about evolution are
frighteningly simplistic but the statistics seem to work reasonably well
in practice. If you're interested in that aspect I'll refer you to the
many papers written by Karlin and Altschul. The scoring model details
are mostly irrelevant to mpiBLAST because it uses the NCBI BLAST code to
do all the hit scoring and e-value computation. The 1.4.0 release of
mpiBLAST has the rank 0 MPI process compute the effective query and
database lengths which are used for e-value calculation prior to
beginning the parallel search. The "effective" length of a sequence
represents the total amount of sequence remaining after blast has
performed low-complexity sequence filtering using the dust algorithm or
something similar. It is assumed that the rank 0 process has access to
the complete database so that it can calculate the correct effective
lengths for the entire DB. The rank 0 mpiblast process calls functions
in the NCBI Toolbox to filter the sequences and calculate the effective
search space without actually performing the search. Once effective
query and database lengths have been calculated by rank 0, the values
are MPI_broadcast() to the rest of the processes. Those values are then
used by worker processes in place of values computed on individual
database fragments.
This older e-mail may also be relevant:
http://bioinformatics.org/pipermail/bioclusters/2005-January/002173.html
The effective search space calculations are a serial component of the
mpiBLAST 1.4.0 implementation. For some workloads, the search space
calculation can be rather time-consuming, making it an excellent target
for parallelization in future mpiBLAST versions...
See also this discussion about using an e-value approximation to get
around the time-consuming serial part of the 1.4.0 implementation:
http://www.mail-archive.com/[email protected]/msg00175.html
http://www.mail-archive.com/[email protected]/msg00177.html
Hope that helps,
-Aaron
Daniel Xavier de Sousa wrote:
>
> Hi for all,
>
>
>
> Please, I’m studding about statistics BLAST and fragmented database
> for blast.
>
> I have read some papers of MPIBlast, but I didn’t find out anything
> that explains "HOW the MPIBlast gets exact e-value statistics of NCBI
> BLAST?".
>
>
>
> Please, where can I read/study about this?
>
>
>
> Thanks
>
> Daniel
>
> *****************************************************************
> * Daniel Xavier de Sousa *
> * Mestrando em Informática - PUC-Rio *
> * E-MAIL : dsousaARROBAinf.puc-rio.br *
> * Fone : +55 21 35271500 - 4543 *
> ****************************************************************
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users
__________________________________________________
Fale com seus amigos de graça com o novo Yahoo! Messenger
http://br.messenger.yahoo.com/
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users