Hi Bremen, You have the basic flow correct. Some details added below:
> 2. Align both chromosomes with blastz We now use lastz, an improved version of blastz (http://www.bx.psu.edu/miller_lab/, search for lastz) but blastz is sufficient. The output format of blastz, LAV, must be converted to either PSL or AXT so that axtChain can read the alignments (we have lavToAxt and lavToPsl programs; PSL is more compact). lastz can produce axt directly with the --output=axt option. > 3. Chain using axtChain (QUESTION: I see this program takes two directories > as arguments. If I wish only to compare two chromosomes, would these > directories have only one file each?) Yes, that would work. However, you can also use the -faQ and -faT options, and give fasta files instead of directories. To see a description of all axtChain options, run "axtChain" with no arguments. Here is an example usage of -faQ and -faT: axtChain -faQ -faT in.axt oneChrom.fa otherChrom.fa chroms.chain Another alternative is to convert your fasta files into the compact format 2bit like this: faToTwoBit oneChrom.fa oneChrom.2bit -- then you can give axtChain [and lastz but not blastz] the .2bit file instead of the directory, and don't have to pass -faQ / -faT. > 4. Get size of each chromosome for chain filtering using faSize Yes. Note that the sequence name and size must appear in a tab-separated file, so use the -detailed flag like this: faSize -detailed oneChrom.fa > oneChrom.sizes > 5. Sort and filter chains > a. use chainMergeSort using chain from step 3 > b. prenet with chainPreNet using chain from 5.a., size of target > chromosome gotten from step 4, and size of query chromosome from > step 4 as arguments > 6. Netting? Yes. We pipe the output of chainPreNet to chainNet (and pipe that to netSyntenic) like this: chainPreNet chroms.chain oneChrom.sizes otherChrom.sizes stdout \ | chainNet stdin -minSpace=1 oneChrom.sizes otherChrom.sizes stdout /dev/null \ | netSyntenic stdin chroms.net > 7. Convert to .maf and use phyloFit Yes. For historical reasons we use netToAxt | axtToMaf, we don't have a netToMaf. You can see an example of how phyloFit was run for the D. melanogaster Conservation track in our source tree (kent/src/hg/makeDb/doc/dm2.txt, search for "PHASTCONS 15WAY" and then search for phyloFit). If you have only two species, there is not much phylogenetic information for phyloFit, but I suppose it could still make a substitution rate model. If you have more than two species, you will also need multiz from http://www.bx.psu.edu/miller_lab/ . > 8. Run phastCons to get .wig output which can be uploaded as a conservation > track Yes. Adam Siepel also has a new method of scoring conservation, phyloP, which we offer in addition to phastCons scores on newer Conservation tracks. I have no idea whether one would be more appropriate than the other when working with only two species. > Since I am only comparing two chromosomes at a time, I think only a single > chain is generated which doesn't have to be netted. Am I missing anything? Netting is still necessary, in order to get single-coverage alignments on which conservation scores are computed. axtChain usually produces many chains that cover the same position on the reference genome/chromosome. The netting process selects the highest-scoring chain as the top level, and then fills in gaps (unaligned areas) using the next highest-scoring chain, and then fills in that chain's gaps using the next highest-scoring chain and so on. It is kind of like extracting a global alignment from a sea of chained local alignments. Max Haeussler contributed a very useful genomewiki page about reconstructing our alignment & conservation pipeline (perhaps you have seen it already?): http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto Hope that helps, and please send more questions to [email protected] as you have them, Angie ----- "Bremen Braun" <[email protected]> wrote: > From: "Bremen Braun" <[email protected]> > To: [email protected] > Sent: Thursday, December 3, 2009 9:31:48 AM GMT -08:00 US/Canada Pacific > Subject: [Genome] Generate conservation info for interspecies genome > comparisons > > Hello, > > I have sequences of similar chromosomes for different species that I > want to > compare. I would like to be able to generate a conservation track such > as > the one seen here: > http://tinyurl.com/yz6o6fu > > I looked at an example of steps to be taken. Could you please > verify/clarify > for me? Let's assume I want to compare 2 chromosomes. > 1. Mask both chromosomes > 2. Align both chromosomes with blastz > 3. Chain using axtChain (QUESTION: I see this program takes two > directories > as arguments. If I wish only to compare two chromosomes, would these > directories have only one file each?) > 4. Get size of each chromosome for chain filtering using faSize > 5. Sort and filter chains > a. use chainMergeSort using chain from step 3 > b. prenet with chainPreNet using chain from 5.a., size of target > chromosome gotten from step 4, and size of query chromosome from step > 4 as > arguments > 6. Netting? > 7. Convert to .maf and use phyloFit > 8. Run phastCons to get .wig output which can be uploaded as a > conservation > track > > Since I am only comparing two chromosomes at a time, I think only a > single > chain is generated which doesn't have to be netted. Am I missing > anything? > > Thanks, > Bremen > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
