Hi Brooke, Thanks for helping me investigate. On a follow-up note, would you (or fellow engineers and staff) know approximately what percentage of the D. melanogaster genome is affected with this issue? Rough estimate will suffice. I'm deciding whether I should redo the multiway alignment or not, and such a metric will help with my decision.
Additionally, would you know what revision of Multiz this issue was resolved it. The current version to date is at v15. Many thanks, Jaaved -- Jaaved Mohammed, Ph.D. Student of Computational Biology Tri-Institutional Training Program in Computational Biology and Medicine (Cornell University - Ithaca, Weill Cornell Medical College, and Memorial Sloan-Kettering Cancer Center) -----Original Message----- From: Brooke Rhead [mailto:[email protected]] Sent: Tuesday, February 22, 2011 4:15 PM To: Jaaved Mohammed Cc: [email protected] Subject: Re: [Genome] Huge block of missing data from insect 15way mulitple alignment Hi Jaaved, One of our engineers looked at the region you pointed out and recognized the missing alignments as a known (old) bug in multiz. If you turn on the chain and net tracks, you can see that the supposedly missing sequence is actually present in the pairwise alignments. The bug should be fixed in more recent versions of multiz. The dm3 15-way multiple alignment is from 2006, and, regrettably, we don't have plans to re-do it, as our funding mandates that we focus on vertebrate species. If you suspect some other region is also misaligned, you should be able to confirm it by looking at the chains and nets for the organism with the supposedly missing sequence and see if the sequence is aligned in them. We apologize for the inconvenience this may cause. -- Brooke Rhead UCSC Genome Bioinformatics Group Jaaved Mohammed wrote on 2/21/11 12:59 PM: > Hello, > > > > I'm seeing one particular block of missing data from the 11 non-melanogaster > species in the insect 15way multiple alignment. The D. melanogaster > coordinate I enter on the browser is "chr3R:18,118,601-18,118,671" and I get > the attached image. I should point out the block spanning from 18,118,608 - > 18,118,647 is missing in all the other 11 species. This interval spans a > popular microRNA which is highly conserved. Along with other evidence, I > suspect this to be an error and not genuine INDEL in the alignment. > > > > Does anyone know what this is attributed to and how/if we can fix this in > the multiple alignment? > > > > Thanks for your generous attention. > > > > Regards, > > Jaaved > > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
