Hi Nimrod, You could create your own multiple sequence alignments, or you could just use the existing alignments and pull out only the species (and regions) you are interested in.
If you want to create your own alignments, this page should be helpful: http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto There are a couple of tools that could help you extract what you want from existing alignments. The first is the "CDS FASTA alignment from multiple alignment" output option in the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables). Select the RefSeq Genes track in hg19, and the CDS FASTA output option will become visible. After hitting "get output" you will see a page where you can select the organisms you want to include in your output. See the user's guide for more info on this option: http://genome.ucsc.edu/goldenPath/help/hgTablesHelp.html#FASTA One caveat to be aware of is that, since not all species will be selected for output, there will be some columns in which all of the alignments will show only a "-". Another option is to use Galaxy (http://main.g2.bx.psu.edu/), which is run by our collaborators at Penn State and works in conjunction with the Genome Browser. I have not personally used the tools there, but there are several that look like they might be useful to you -- see "Filter MAF blocks by Species," "Extract MAF blocks given a set of genomic intervals," and "Stitch Gene blocks given a set of coding exon intervals" on the left-hand side of the page under the "Fetch Alignments" header. If you have questions about using Galaxy, their helpdesk addres is [email protected]. -- Brooke Rhead UCSC Genome Bioinformatics Group On 05/27/11 09:49, nimrod rubinstein wrote: > Hi, > > I think my question is pretty trivial and has probably been raised many > times before, nevertheless I couldn't find a direct answer for it in the > archives. > > Anyway, I'm interested in building > Human-Chimp-Orangutan-Rhesus multiple sequence alignments for every human > refseq gene. > The way I thought of accomplishing this is to: > 1. Derive the coding sequence coordinates from the hg19 refGene file for > every human refseq gene. > 2. Get the sequences of human and each of the other organisms that map to > these coordinates from the syntenicNet pairwise alignment files > (e.g., chr1.hg19.panTro2.synNet.axt.gz). > 3. Combine these pairwise sequence files to multiple sequence files and run > my own multiple sequence alignment program. > > Does this make sense or is there any other better established way to do > that? > > Thanks a lot, > Nimrod Rubinstein > NESCent fellow > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
