Hi Guillermo, Thank you for your patience. One of our engineers has this to say, "The other alignments are being dropped due to the global near best criteria of only saved alignments that score within 0.25 % of the top scoring alignment. The top scoring one aligns with 100% identity, the next one is 96%."
I hope this helps to clarify things for you. If you have further questions, please contact the mailing list: [email protected]. Vanessa Kirkup Swing UCSC Genome Bioinformatics Group ----- Original Message ----- From: "Guillermo Parada" <[email protected]> To: "Vanessa Kirkup Swing" <[email protected]> Sent: Friday, April 29, 2011 8:00:58 PM Subject: Re: [Genome] Alignment filter by identity Hi Vanessa, I already found a perl code to calculate PIDs based in your source published at BLAT FAQ, and I successfully personalized it to my own recruitments. I ran it over the Table Browser mm9 data (all_mrna and all_est). I found 1491 cDNAs (0.64%) and 129268 EST (2.96%) with lower PID than 95%. So, this data suggest that you didn't filtered the alignments lower than 95% of PID. I really need you tell me the criteria applied to filter out the low quality alignments of the Table Browser data, because it will be my a gold standard criteria to filter the alignments done by another programs. I'm very interesting in the way yo distinguish the sequences that only align once from those which align more than locus into genome (putative pseudogenes). Thanks for your kind attention. Best Regards 2011/4/29 Vanessa Kirkup Swing < [email protected] > Hi Guillermo, We are currently working on your question. We hope to have some answers for you sometime next week. Thank you for your patience. Vanessa Kirkup Swing UCSC Genome Bioinformatics Group ----- Original Message ----- From: "Guillermo Parada" < [email protected] > To: [email protected] Sent: Wednesday, April 27, 2011 9:49:18 AM Subject: [Genome] Alignment filter by identity Hello UCSC Genome Browser stuff! My name is Guillermo and I downloaded the aligment data of the cDNAs and EST over mm9 genome from Table Browser and now I'm comparing these with my gmap alignment results of the same cDNAs and EST. I writing to you because I'm not clear about the BLAT alignments filter parameters you used to generate the alignments. The BLAT program specification says "Blat produces .... at the DNA level between two sequences that are of 95% or greater identity ..." ( https://cgwb.nci.nih.gov/goldenPath/help/blatSpec.html ). But in addition you may configured the pslReps filter options to only get the alignments over certain amount of coverage (-minCover flag). Did you? I Found some cDNAs cases like BC096042 that at the Table Browser alignments results only have one alignment, but when I put the BC096042 sequence into web version BLAT of the Genome Browser, it shows many alignments, which is expected because there are unfiltered alignments. But what make no sense to me, is that show me many alignment with identity over 95%. Why this alignments aren't at Table Browser? Maybe because in this particular case the BC096042 alignment has a 100% identity and automatically the others sub-optimals alignments were deleted, Is that right? (see the 100% identity alignment also at Genome Browser http://genome.ucsc.edu/cgi-bin/hgc?hgsid=193400045&o=125111868&t=125115317&g=mrna&i=BC096042 ) In order to use your BLAT aligments results from the Table Browser I need to know your filter parameters. Also I would like to confirm the formula (that is used at the hyperlink line) to get the identity from the psl line. I need it to apply to my gmap results and after many attempts, I get to a logical way to calculate it: [(match*100/(match+mismatch+Q gap bases) + match*100/(blocksizes summation)]/2 Please tell me if that's right. Very thanks for your time. I look forward to your reply. -- Guillermo Parada. Undergraduate student Biochemistry at Ponticia Univesidad Católica de Chile. _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome -- Guillermo Parada. Undergraduate student Biochemistry of Ponticia Univesidad Católica de Chile. _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
