Hi Galt,

Thanks.  I understand insertion/deletion and substitution of the sequence.

I was taking percent identity to represent percent of query size aligned AND is 
identical to genome.  

So, in my example below, i was including 50 or so unaligned residues or in 
other words insertions in the query and two substitutions to calculate the 
percent identity compared to the query size. (354-52)/354x100

However, it appears that it is not the case.

Thus by referring to the hyperlink output, there is no way one can interpret 
how many residues of the query find hit to the genome AND are identical without 
looking at the alignment.


Thanks,

Medha


Sent from my iPad

On Jul 14, 2011, at 4:50 PM, "Galt Barber" <[email protected]> wrote:

> 
> Hi, Medha!
> 
> Do you understand the difference between an insert/deletion
> and a substitution?
> 
> The parts that of the query that align, they align very well,
> nearly 100% except for two substitutions.
> 
> The insert on the query side are not substitutions.
> This is a part of the query that does not align
> at all with the target dna genome.
> 
> The fact that there is an insert does penalize
> the identity score a little bit.
> 
> The details of the identity score are explained exactly
> in detail in the link I gave you.
>    http://genome.ucsc.edu/FAQ/FAQblat#blat4
> 
> Typically one must include the notion of coverage
> as well percent-identity to evaluate alignment quality.
> Coverage would typically be how much of the query was aligned.
> 
> Please explain what does not make sense to you.
> 
> -Galt
> 
> 7/14/2011 12:46 PM, Bhagwat, Medha (NIH/OD/ORS) [E]:
>> Sorry, I should have mentioned earlier, I read the FAQ but still was not 
>> clear to me 99% identity.
>> 
>> Can you help me understand?  Thanks,
>> 
>> 
>> Medha
>> _____________________________
>> 
>> 
>> -----Original Message-----
>> From: Galt Barber [mailto:[email protected]]
>> Sent: Thursday, July 14, 2011 1:52 PM
>> To: Bhagwat, Medha (NIH/OD/ORS) [E]
>> Cc: '[email protected]'
>> Subject: Re: [Genome] FW: BLAT Identity percent
>> 
>> 
>> Hi, Medha!
>> 
>> Here is how the identity number is calculated:
>> 
>>   http://genome.ucsc.edu/FAQ/FAQblat#blat4
>> 
>> It includes gaps (or inserts depending on perspective) as well
>> as substitutions.
>> 
>> Your blat link shows 2 substitutions,
>> and, on the query side, one insert.
>> 
>> -Galt
>> 
>> 7/14/2011 7:30 AM, Bhagwat, Medha (NIH/OD/ORS) [E]:
>>> Did I miss the response?  Thanks,
>>> 
>>> Medha
>>> 
>>> 
>>> From: Bhagwat, Medha (NIH/OD/ORS) [E]
>>> Sent: Tuesday, July 12, 2011 7:21 PM
>>> To: [email protected]
>>> Cc: Bhagwat, Medha (NIH/OD/ORS) [E]
>>> Subject: BLAT Identity percent
>>> 
>>> HI,
>>> 
>>> 
>>> 1.       What does the identity number in the result indicate?
>>> 
>>> 2.       How it is calculated?
>>> 
>>> 3.       Example: search against the chimp genome
>>> 
>>> browser<http://genome.ucsc.edu/cgi-bin/hgTracks?position=chrX:38694212-38778403&db=panTro3&ss=../trash/hgSs/hgSs_genome_6740_cad6e0.pslx+../trash/hgSs/hgSs_genome_6740_cad6e0.fa&hgsid=202630125>
>>>    
>>> details<http://genome.ucsc.edu/cgi-bin/hgc?o=38694211&g=htcUserAli&i=../trash/hgSs/hgSs_genome_6740_cad6e0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_6740_cad6e0.fa+NP_000522.3&c=chrX&l=38694211&r=38778403&db=panTro3&hgsid=202630125>
>>>    NP_000522.3      837     1   354   354  99.1%     X  ++   38694212  
>>> 38778403  84192
>>> 
>>> 
>>> 
>>> 99.1% identity for the 354 amino acid query should give around 3 mismatches 
>>> but has several in the alignment
>>> 
>>> 
>>> 
>>> MLFNLRILLN NAAFRNGHNF MVRNFRCGQP LQNKVQLKGR DLLTLKNFTG EEIKYMLWLS  60
>>> 
>>> ADLKFRIKQK GEylpllqgk slgmifekrs trtrlstetG FALLGGHPCF LTTQDIHLGV  120
>>> 
>>> NESLtDTARV LSSMaDAVLA RVYKQSDLDT LAKEASIPII NGLSDLYHPI QILADYLTLQ  180
>>> 
>>> ehysslkglt lswigdgnni lhsimmsaak fgmhlqaatp kGYEPDASVT KLAEQYAKEN  240
>>> 
>>> GTKLLLTNDP LEAAHGGNVL ITDTWISMGQ EEEKKKRLQA FQGYQVTMKT AKVAASDWTF  300
>>> 
>>> LHCLPRKPEE VDDEVFYSPR SLVFPEAENR KWTIMAVMVS LLTDYSPQLQ KPKF
>>> 
>>> 
>>> 
>>> 
>>> Thanks.
>>> 
>>> Medha
>>> 
>>> 
>>> _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>> 
> 

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to