Hi Jaaved, Since my first email to you, there has been some talk here of re-running both the multiz and phastcons programs on the dm3 alignments (though we haven't committed to doing it just yet).
I asked the engineers here for more information on what percentage of the genome is affected by the bug, and when multiz got fixed. Figuring out what percentage is affected is non-trivial (and hard to do without just re-running the multiz and looking to see how many places change!). Also, the colleagues I spoke to didn't know exactly what the bug was that caused the missing sequence, just that it got fixed when we got a new version of multiz. Thank you for pointing out the problem in the dm3 multiple alignment. We will get back to you within a week or two with what we decide to do about re-running multiz. -- Brooke Rhead UCSC Genome Bioinformatics Group On 02/28/11 10:28, Jaaved Mohammed wrote: > Hi Brooke, > > Thanks for helping me investigate. On a follow-up note, would you (or fellow > engineers and staff) know approximately what percentage of the D. > melanogaster genome is affected with this issue? Rough estimate will > suffice. I'm deciding whether I should redo the multiway alignment or not, > and such a metric will help with my decision. > > Additionally, would you know what revision of Multiz this issue was resolved > it. The current version to date is at v15. > > Many thanks, > Jaaved > > -- > Jaaved Mohammed, > Ph.D. Student of Computational Biology > Tri-Institutional Training Program in Computational Biology and Medicine > (Cornell University - Ithaca, Weill Cornell Medical College, and Memorial > Sloan-Kettering Cancer Center) > > -----Original Message----- > From: Brooke Rhead [mailto:[email protected]] > Sent: Tuesday, February 22, 2011 4:15 PM > To: Jaaved Mohammed > Cc: [email protected] > Subject: Re: [Genome] Huge block of missing data from insect 15way mulitple > alignment > > Hi Jaaved, > > One of our engineers looked at the region you pointed out and recognized > the missing alignments as a known (old) bug in multiz. If you turn on > the chain and net tracks, you can see that the supposedly missing > sequence is actually present in the pairwise alignments. > > The bug should be fixed in more recent versions of multiz. The dm3 > 15-way multiple alignment is from 2006, and, regrettably, we don't have > plans to re-do it, as our funding mandates that we focus on vertebrate > species. > > If you suspect some other region is also misaligned, you should be able > to confirm it by looking at the chains and nets for the organism with > the supposedly missing sequence and see if the sequence is aligned in them. > > We apologize for the inconvenience this may cause. > > -- > Brooke Rhead > UCSC Genome Bioinformatics Group > > > Jaaved Mohammed wrote on 2/21/11 12:59 PM: >> Hello, >> >> >> >> I'm seeing one particular block of missing data from the 11 > non-melanogaster >> species in the insect 15way multiple alignment. The D. melanogaster >> coordinate I enter on the browser is "chr3R:18,118,601-18,118,671" and I > get >> the attached image. I should point out the block spanning from 18,118,608 > - >> 18,118,647 is missing in all the other 11 species. This interval spans a >> popular microRNA which is highly conserved. Along with other evidence, I >> suspect this to be an error and not genuine INDEL in the alignment. >> >> >> >> Does anyone know what this is attributed to and how/if we can fix this in >> the multiple alignment? >> >> >> >> Thanks for your generous attention. >> >> >> >> Regards, >> >> Jaaved >> >> >> >> >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
