Hi Vanessa, Thank you so much for your reply.
Best, Hong On Tue, Dec 6, 2011 at 5:36 PM, Vanessa Kirkup Swing <[email protected]>wrote: > Hi Hong, > > Our primary focus is on vertebrates which these alignments are all based > on. This means that we won't be adding yeast, fly, or nematode to the > multiple alignment. You might be interested in the BlastTab tables which > establish orthology by BLAST. Here is a previously answered mailing list > question that will give you more information: > > https://lists.soe.ucsc.edu/pipermail/genome/2011-July/026544.html > > > With regards to your last question, NM_001024599 aligns multiple places in > the genome. One of those places is aligned to calJac1 in the multiple > alignment, the other is not. > > The description page for the conservation track has a lot of information > that might be of interest to you: > > http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&c=chr21&g=cons46way > > Hope this helps. If you have further questions, please email the mailing > list: [email protected]. > > Vanessa Kirkup Swing > UCSC Genome Bioinformatics Group > > > ---------- Forwarded message ---------- > From: Hong Lu <[email protected]> > Date: Tue, Dec 6, 2011 at 2:15 PM > Subject: Re: [Genome] UCSC human multiple alignment (protein) > To: Brian Raney <[email protected]> > Cc: [email protected] > > > Hi Brian, > > Thank you so much for your quick reply. > > I attached two files (diff_seq.txt and miss_seq.txt). The file > "diff_seq.txt" includes a list of genes that the protein sequences from > NCBI are different from the protein sequences used to run multiple > alignment at UCSC. The file "miss_seq.txt" includes a list of genes that > have protein sequences at NCBI, but have no alignment results at UCSC > outcomes. > > And also are you planning to include yeast, drosophila, and c elegans into > the multiple alignment? Thanks. > > For the problem of NM_001024599 and NM_000344, I will take NM_001024599 as > an example. > > If I grep "NM_001024599" and "hg19" from the alignment results, I can found > two records. > >NM_001024599_hg19_1_1 127 0 0 chr1:149398799-149399179- > > MPDPAKSAPAPKKGSKKAVTKVQKKDGEKRKRSRKESYSVYVZEVLKQVHPDTGISSKTMGIMNSFVNDIFERIAGEASRLAHYNKRSTITSREIQTAVRLLLPGELAKHAVSEGTKAVTKYTSSKZ > > >NM_001024599_hg19_1_1 127 0 0 chr1:149783498-149783878- > > MPDPAKSAPAPKKGSKKAVTKVQKKDGKKRKRSRKESYSVYVYKVLKQVHPDTGISSKAMGIMNSFVNDIFERIAGEASRLAHYNKRSTITSREIQTAVRLLLPGELAKHAVSEGTKAVTKYTSSKZ > > The protein sequences of this gene at two different positions are exactly > same. But when we check the homology of calJac1 > > At the first position, it's > >NM_001024599_calJac1_1_1 127 0 0 Contig10067:55107-55487- > > MPDPAKSAPAPKKGSKKAVTKVQKKDGKKRKRSRKESYSVYVYKVLKQVHPDTGISSKAMGIMNSFVNDIFERIAGEASRLAHYNKRSTITSREIQTAVRLLLPGELAKHAVSEGTKAVTKYTSSKZ > > At the second position, it's > >NM_001024599_calJac1_1_1 127 0 0 > > ------------------------------------------------------------------------------------------------------------------------------- > > That means we cannot find homology of gene NM_001024599 at the second > position even if we can find it at the first position. I don't know the > reason of this. Thanks. > > Best, > > Hong > > > On Tue, Dec 6, 2011 at 1:39 PM, Brian Raney <[email protected]> wrote: > > > Hey Hong, > > I'll go ahead and regenerate the alignments for the refSeq gene > > models. These should be ready by tomorrow. I'll send you some mail > > off-list to tell you when they're done. In the near future we plan > > to regenerate these files more frequently. > > I don't understand your question about NM_001024599 and NM_000344. > > Those two genes have significantly different mRNA sequence, and are > > found in different places in the genome. > > I hope this answers your questions. Please reply to this list with > > any follow up questions. > > Brian > > On Tue, Dec 6, 2011 at 11:57 AM, Hong Lu <[email protected]> wrote: > > > Hello, > > > > > > I am interested in the multiple alignment of human proteins from UCSC. > > > > > > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/multiz46way/alignments/refGene.exonAA.fa.gz > > > > > > But this file is relatively old (about two years old). Are you planing > to > > > update this file? If so, would you please tell me when it can be > > released. > > > NCBI has been updated many genes within the last two years. > > > > > > And also, when I read that file, I found sometimes the human protein > > > sequences are exactly same, but the multiple alignment are very > different > > > (such as NM_001024599 and NM_000344). Could you tell me the reason? > > Thanks. > > > > > > Best, > > > > > > Hong > > > _______________________________________________ > > > Genome maillist - [email protected] > > > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome > > > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
