Hi Galt, Thanks. I understand insertion/deletion and substitution of the sequence.
I was taking percent identity to represent percent of query size aligned AND is identical to genome. So, in my example below, i was including 50 or so unaligned residues or in other words insertions in the query and two substitutions to calculate the percent identity compared to the query size. (354-52)/354x100 However, it appears that it is not the case. Thus by referring to the hyperlink output, there is no way one can interpret how many residues of the query find hit to the genome AND are identical without looking at the alignment. Thanks, Medha Sent from my iPad On Jul 14, 2011, at 4:50 PM, "Galt Barber" <[email protected]> wrote: > > Hi, Medha! > > Do you understand the difference between an insert/deletion > and a substitution? > > The parts that of the query that align, they align very well, > nearly 100% except for two substitutions. > > The insert on the query side are not substitutions. > This is a part of the query that does not align > at all with the target dna genome. > > The fact that there is an insert does penalize > the identity score a little bit. > > The details of the identity score are explained exactly > in detail in the link I gave you. > http://genome.ucsc.edu/FAQ/FAQblat#blat4 > > Typically one must include the notion of coverage > as well percent-identity to evaluate alignment quality. > Coverage would typically be how much of the query was aligned. > > Please explain what does not make sense to you. > > -Galt > > 7/14/2011 12:46 PM, Bhagwat, Medha (NIH/OD/ORS) [E]: >> Sorry, I should have mentioned earlier, I read the FAQ but still was not >> clear to me 99% identity. >> >> Can you help me understand? Thanks, >> >> >> Medha >> _____________________________ >> >> >> -----Original Message----- >> From: Galt Barber [mailto:[email protected]] >> Sent: Thursday, July 14, 2011 1:52 PM >> To: Bhagwat, Medha (NIH/OD/ORS) [E] >> Cc: '[email protected]' >> Subject: Re: [Genome] FW: BLAT Identity percent >> >> >> Hi, Medha! >> >> Here is how the identity number is calculated: >> >> http://genome.ucsc.edu/FAQ/FAQblat#blat4 >> >> It includes gaps (or inserts depending on perspective) as well >> as substitutions. >> >> Your blat link shows 2 substitutions, >> and, on the query side, one insert. >> >> -Galt >> >> 7/14/2011 7:30 AM, Bhagwat, Medha (NIH/OD/ORS) [E]: >>> Did I miss the response? Thanks, >>> >>> Medha >>> >>> >>> From: Bhagwat, Medha (NIH/OD/ORS) [E] >>> Sent: Tuesday, July 12, 2011 7:21 PM >>> To: [email protected] >>> Cc: Bhagwat, Medha (NIH/OD/ORS) [E] >>> Subject: BLAT Identity percent >>> >>> HI, >>> >>> >>> 1. What does the identity number in the result indicate? >>> >>> 2. How it is calculated? >>> >>> 3. Example: search against the chimp genome >>> >>> browser<http://genome.ucsc.edu/cgi-bin/hgTracks?position=chrX:38694212-38778403&db=panTro3&ss=../trash/hgSs/hgSs_genome_6740_cad6e0.pslx+../trash/hgSs/hgSs_genome_6740_cad6e0.fa&hgsid=202630125> >>> >>> details<http://genome.ucsc.edu/cgi-bin/hgc?o=38694211&g=htcUserAli&i=../trash/hgSs/hgSs_genome_6740_cad6e0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_6740_cad6e0.fa+NP_000522.3&c=chrX&l=38694211&r=38778403&db=panTro3&hgsid=202630125> >>> NP_000522.3 837 1 354 354 99.1% X ++ 38694212 >>> 38778403 84192 >>> >>> >>> >>> 99.1% identity for the 354 amino acid query should give around 3 mismatches >>> but has several in the alignment >>> >>> >>> >>> MLFNLRILLN NAAFRNGHNF MVRNFRCGQP LQNKVQLKGR DLLTLKNFTG EEIKYMLWLS 60 >>> >>> ADLKFRIKQK GEylpllqgk slgmifekrs trtrlstetG FALLGGHPCF LTTQDIHLGV 120 >>> >>> NESLtDTARV LSSMaDAVLA RVYKQSDLDT LAKEASIPII NGLSDLYHPI QILADYLTLQ 180 >>> >>> ehysslkglt lswigdgnni lhsimmsaak fgmhlqaatp kGYEPDASVT KLAEQYAKEN 240 >>> >>> GTKLLLTNDP LEAAHGGNVL ITDTWISMGQ EEEKKKRLQA FQGYQVTMKT AKVAASDWTF 300 >>> >>> LHCLPRKPEE VDDEVFYSPR SLVFPEAENR KWTIMAVMVS LLTDYSPQLQ KPKF >>> >>> >>> >>> >>> Thanks. >>> >>> Medha >>> >>> >>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
