Re: [BiO BB] [Bioperl-l] Fwd: [blast-help] Fwd: Question about the definition of 'gaps' in blast -m8 output...

Chris Fields Mon, 23 Mar 2009 14:18:02 -0700

We have a way to check for both within HSP objects I believe.

1) gaps() is documented to return the number of gap characters withinthe query/hit/total2) seq_inds('gap', 'hit/query') returns the number of gap positions,with the position repeated for every gap character I believe, sogetting this in scalar context should be similar to gaps(). However,to reduce that down to just the gap positions (no repeats) use thecollapse flag: seq_inds('gap', 'hit/query', 1)


chris

On Mar 20, 2009, at 12:13 PM, Dan Bolser wrote:

Here is what the man from the NCBI said:


---------- Forwarded message ----------
From: Peter Cooper <[email protected]>
Date: 2009/3/20
Subject: Re: [blast-help] Fwd: Question about the definition of 'gaps'
in blast -m8 output...
To: [email protected]
Cc: [email protected]


Hello,

The number reported tin the -m  8 output is the number of gap
openings. This will only equal the number of gap characters if the
length of each gap is 1.


Peter
-------------------------------
Peter S. Cooper, Ph.D.
Public Services
The National Center for Biotechnology Information
301-435-5951






On Mar 19, 2009, at 12:04 PM, romiti wrote:
Begin forwarded message:
From: User Services Service Account <[email protected]>
Date: March 19, 2009 10:23:01 AM EDT
To: [email protected]
Subject: Question about the definition of 'gaps' in blast -m8output...
Reply-To: User Services Service Account <[email protected]>


------------- Begin Forwarded Message -------------

Date: Wed, 18 Mar 2009 19:31:01 +0000
Subject: Question about the definition of 'gaps' in blast -m8output...
From: Dan Bolser <[email protected]>
To: [email protected], [email protected], [email protected]


Hi,
I'm sure this question comes up again and again, but searching theBioPerlmailing list didn't turn up any answers (to the second question).BasicallyI want to manually merge HSP's into 'contigious hits', and I wantto look atthe effect of various parameters on an algorithm to do this. Thistaskprompted me to run a 'sanity check' on the blast data that I had,and Ifound that this check fails to fulfil my expectation of the data.This means
that either I don't understand the data or the results are buggy.
Can someone clarify the definition of the 'gaps' column in theblast -m8
output format for me?
I thought that the column 'gaps' was basically the number ofcolumns in theHSP that contains a gap character. To test this on my data, Ichecked the
following equality:

GAPS + 2 =
LENGTH - abs(QUERY_END - QUERY_START) + LENGTH - abs(HIT_END -HIT_START)
This says that the number of GAPS should be equal to thedifference betweenthe LENGTH of the alignment minus the distance between the STARTand END
point on either the QUERY or the HIT (+2 for the 'off by one' error
introduced by the two END-START calculations).

e.g.

10-> MMMMMMMM**MMMM*M <-22
  ||||  ||  | |  |
20-> MMMM**MMMMM*M*MM <-31
where MISMATCHES = 7, LENGTH = 16, QUERY_END - QUERY_START = 12,and HIT_END
- HIT_START = 11. The formula gives:

7+2(9) = 16-12(4) + 16-11(5)
The formula is correct for 11,282 out of 12,745 HSPs in my dataset(89%),however it fails for 1,463 cases (11%). Each of these cases has avalue ofMISMATCHES smaller than calculated by the formula. The differenceis usually1 or 2, but is seen to go as high as 96, and scales roughlylinearly with
the size of GAPS.
Did I misunderstand what the value of GAPS is supposed to mean?How come it
does apparently mean what I thought for so much of the data?


Thanks very much for any help on the above.

Dan.

------------- End Forwarded Message -------------
_______________________________________________
Bioperl-l mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/bioperl-l



_______________________________________________
BBB mailing list
[email protected]
http://www.bioinformatics.org/mailman/listinfo/bbb

Re: [BiO BB] [Bioperl-l] Fwd: [blast-help] Fwd: Question about the definition of 'gaps' in blast -m8 output...

Reply via email to