Hi Jennifer, It would be grateful if you could provide me with more details about how to remove un-wanted species on hg18 conservation track because I tried a lot without a clue. I can only get multiz17way from the Tables, then I tried to output to Galaxy, but failed. This may because of the size of the file is too large to be load in Galaxy. On the other hand, I don't really want to use Galaxy, because I have a bit concern about the method Galaxy used to remove species. There is a possibility that Galaxy simply remove lines from MAF without doing the multiple alignment again.
Kind regards, Yuan On 23 Jun 2009, at 19:06, Jennifer Jackson wrote: > Hi Yuan, > The Conservation track in hg18 has control options that would allow > you to remove any species not in your set. This is a compound track > - meaning that conserved regions are a part of the sub-track set. > Download/data access info & options: > http://genome-test.cse.ucsc.edu/FAQ/FAQdownloads#download1 > http://genome-test.cse.ucsc.edu/FAQ/FAQdownloads#download29 > http://genome-test.cse.ucsc.edu/FAQ/FAQtracks#tracks21 > > If you still want to do this on your own, the Conservation track is > still a good reference. For each species, the methods we used are > outlined. Different alignment methods were used for different > species based on biological reasoning (evolutionary distance, > quality of genomic, etc). For details about each pair-wise, see the > individual tracks in that species' genome browser and review the > creation methods. Some of these may not be on the public server, but > on the test server at http://genome-test.cse.ucsc.edu/. Please note > that all tracks on the test server /that are not/ on the regular > public server have not undergone formal QA and may have sparse > methods - although you should be able to identify similar tracks on > the public server (in another species) with complete methods listed. > However - with your list of genomes - this should not be a problem. > > Some notes from a UCSC Scientist that creates this type of data: >> http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto >> >> That is a great write-up by a power-user who managed to sort of >> duplicate our process locally, for a small genome. >> >> And we should also stress upfront that the process requires big >> compute resources. If they're working with vertebrate genomes, >> they should have access to a cluster with at least ~50 CPUs (more >> is better) and if mammalian genomes, at least a few hundred CPUs. >> Otherwise the compute time is prohibitive. >> And we should probably tell them that we now use lastz, a greatly >> improved replacement for blastz. >> >> http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.01.50/README.lastz-1.01.50.html >> >> Angie >> > Thanks, > Jennifer Jackson > UCSC Genome Bioinformatics Group > > Yuan Hao wrote: >> Dear List, >> >> I would like to create my own multiple alignment file for hg18, >> mm9, rn4 and canFam2 from UCSC pairwise alignments by using Multiz/ >> TBA aligner. I got following several questions which I am not sure >> yet after a broad reading. It would be very appreciated if you >> could shed some lights on them: >> >> - Which aligner, Multiz or TBA, would be better if my purpose is >> to study the motif conservation on the final MAF. >> >> - Multiz/TBA takes pairwise alignment in .maf format. While from >> UCSC I can only find pairwise alignment in .chain, .net or axtNet >> format. I found there are programs available in Kent source to do >> the format convert: chainToAxt, netToAxt or axtToMaf. My question >> is which pairwise format should I download to create multiple >> alignment? >> >> - Is there anything else I missed here during this process? >> >> Thank you very much in advance! >> >> Kind regards, >> Yuan >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
