Hi, Medha!

Do you understand the difference between an insert/deletion
and a substitution?

The parts that of the query that align, they align very well,
nearly 100% except for two substitutions.

The insert on the query side are not substitutions.
This is a part of the query that does not align
at all with the target dna genome.

The fact that there is an insert does penalize
the identity score a little bit.

The details of the identity score are explained exactly
in detail in the link I gave you.
    http://genome.ucsc.edu/FAQ/FAQblat#blat4

Typically one must include the notion of coverage
as well percent-identity to evaluate alignment quality.
Coverage would typically be how much of the query was aligned.

Please explain what does not make sense to you.

-Galt

7/14/2011 12:46 PM, Bhagwat, Medha (NIH/OD/ORS) [E]:
> Sorry, I should have mentioned earlier, I read the FAQ but still was not 
> clear to me 99% identity.
>
> Can you help me understand?  Thanks,
>
>
> Medha
> _____________________________
>
>
> -----Original Message-----
> From: Galt Barber [mailto:[email protected]]
> Sent: Thursday, July 14, 2011 1:52 PM
> To: Bhagwat, Medha (NIH/OD/ORS) [E]
> Cc: '[email protected]'
> Subject: Re: [Genome] FW: BLAT Identity percent
>
>
> Hi, Medha!
>
> Here is how the identity number is calculated:
>
>    http://genome.ucsc.edu/FAQ/FAQblat#blat4
>
> It includes gaps (or inserts depending on perspective) as well
> as substitutions.
>
> Your blat link shows 2 substitutions,
> and, on the query side, one insert.
>
> -Galt
>
> 7/14/2011 7:30 AM, Bhagwat, Medha (NIH/OD/ORS) [E]:
>> Did I miss the response?  Thanks,
>>
>> Medha
>>
>>
>> From: Bhagwat, Medha (NIH/OD/ORS) [E]
>> Sent: Tuesday, July 12, 2011 7:21 PM
>> To: [email protected]
>> Cc: Bhagwat, Medha (NIH/OD/ORS) [E]
>> Subject: BLAT Identity percent
>>
>> HI,
>>
>>
>> 1.       What does the identity number in the result indicate?
>>
>> 2.       How it is calculated?
>>
>> 3.       Example: search against the chimp genome
>>
>> browser<http://genome.ucsc.edu/cgi-bin/hgTracks?position=chrX:38694212-38778403&db=panTro3&ss=../trash/hgSs/hgSs_genome_6740_cad6e0.pslx+../trash/hgSs/hgSs_genome_6740_cad6e0.fa&hgsid=202630125>
>>    
>> details<http://genome.ucsc.edu/cgi-bin/hgc?o=38694211&g=htcUserAli&i=../trash/hgSs/hgSs_genome_6740_cad6e0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_6740_cad6e0.fa+NP_000522.3&c=chrX&l=38694211&r=38778403&db=panTro3&hgsid=202630125>
>>    NP_000522.3      837     1   354   354  99.1%     X  ++   38694212  
>> 38778403  84192
>>
>>
>>
>> 99.1% identity for the 354 amino acid query should give around 3 mismatches 
>> but has several in the alignment
>>
>>
>>
>> MLFNLRILLN NAAFRNGHNF MVRNFRCGQP LQNKVQLKGR DLLTLKNFTG EEIKYMLWLS  60
>>
>> ADLKFRIKQK GEylpllqgk slgmifekrs trtrlstetG FALLGGHPCF LTTQDIHLGV  120
>>
>> NESLtDTARV LSSMaDAVLA RVYKQSDLDT LAKEASIPII NGLSDLYHPI QILADYLTLQ  180
>>
>> ehysslkglt lswigdgnni lhsimmsaak fgmhlqaatp kGYEPDASVT KLAEQYAKEN  240
>>
>> GTKLLLTNDP LEAAHGGNVL ITDTWISMGQ EEEKKKRLQA FQGYQVTMKT AKVAASDWTF  300
>>
>> LHCLPRKPEE VDDEVFYSPR SLVFPEAENR KWTIMAVMVS LLTDYSPQLQ KPKF
>>
>>
>>
>>
>> Thanks.
>>
>> Medha
>>
>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to