Hi, Medha!
Do you understand the difference between an insert/deletion
and a substitution?
The parts that of the query that align, they align very well,
nearly 100% except for two substitutions.
The insert on the query side are not substitutions.
This is a part of the query that does not align
at all with the target dna genome.
The fact that there is an insert does penalize
the identity score a little bit.
The details of the identity score are explained exactly
in detail in the link I gave you.
http://genome.ucsc.edu/FAQ/FAQblat#blat4
Typically one must include the notion of coverage
as well percent-identity to evaluate alignment quality.
Coverage would typically be how much of the query was aligned.
Please explain what does not make sense to you.
-Galt
7/14/2011 12:46 PM, Bhagwat, Medha (NIH/OD/ORS) [E]:
> Sorry, I should have mentioned earlier, I read the FAQ but still was not
> clear to me 99% identity.
>
> Can you help me understand? Thanks,
>
>
> Medha
> _____________________________
>
>
> -----Original Message-----
> From: Galt Barber [mailto:[email protected]]
> Sent: Thursday, July 14, 2011 1:52 PM
> To: Bhagwat, Medha (NIH/OD/ORS) [E]
> Cc: '[email protected]'
> Subject: Re: [Genome] FW: BLAT Identity percent
>
>
> Hi, Medha!
>
> Here is how the identity number is calculated:
>
> http://genome.ucsc.edu/FAQ/FAQblat#blat4
>
> It includes gaps (or inserts depending on perspective) as well
> as substitutions.
>
> Your blat link shows 2 substitutions,
> and, on the query side, one insert.
>
> -Galt
>
> 7/14/2011 7:30 AM, Bhagwat, Medha (NIH/OD/ORS) [E]:
>> Did I miss the response? Thanks,
>>
>> Medha
>>
>>
>> From: Bhagwat, Medha (NIH/OD/ORS) [E]
>> Sent: Tuesday, July 12, 2011 7:21 PM
>> To: [email protected]
>> Cc: Bhagwat, Medha (NIH/OD/ORS) [E]
>> Subject: BLAT Identity percent
>>
>> HI,
>>
>>
>> 1. What does the identity number in the result indicate?
>>
>> 2. How it is calculated?
>>
>> 3. Example: search against the chimp genome
>>
>> browser<http://genome.ucsc.edu/cgi-bin/hgTracks?position=chrX:38694212-38778403&db=panTro3&ss=../trash/hgSs/hgSs_genome_6740_cad6e0.pslx+../trash/hgSs/hgSs_genome_6740_cad6e0.fa&hgsid=202630125>
>>
>> details<http://genome.ucsc.edu/cgi-bin/hgc?o=38694211&g=htcUserAli&i=../trash/hgSs/hgSs_genome_6740_cad6e0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_6740_cad6e0.fa+NP_000522.3&c=chrX&l=38694211&r=38778403&db=panTro3&hgsid=202630125>
>> NP_000522.3 837 1 354 354 99.1% X ++ 38694212
>> 38778403 84192
>>
>>
>>
>> 99.1% identity for the 354 amino acid query should give around 3 mismatches
>> but has several in the alignment
>>
>>
>>
>> MLFNLRILLN NAAFRNGHNF MVRNFRCGQP LQNKVQLKGR DLLTLKNFTG EEIKYMLWLS 60
>>
>> ADLKFRIKQK GEylpllqgk slgmifekrs trtrlstetG FALLGGHPCF LTTQDIHLGV 120
>>
>> NESLtDTARV LSSMaDAVLA RVYKQSDLDT LAKEASIPII NGLSDLYHPI QILADYLTLQ 180
>>
>> ehysslkglt lswigdgnni lhsimmsaak fgmhlqaatp kGYEPDASVT KLAEQYAKEN 240
>>
>> GTKLLLTNDP LEAAHGGNVL ITDTWISMGQ EEEKKKRLQA FQGYQVTMKT AKVAASDWTF 300
>>
>> LHCLPRKPEE VDDEVFYSPR SLVFPEAENR KWTIMAVMVS LLTDYSPQLQ KPKF
>>
>>
>>
>>
>> Thanks.
>>
>> Medha
>>
>>
>> _______________________________________________
>> Genome maillist - [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome