Hi Bremen, 

Regarding the error messages from bigTrf/trf -- what do you see if you run the 
trf command (copied from the maybeSystem(...)) in your shell? I'm hoping that 
there will be a more informative error message from trf itself. If trf is 
failing to run on your sequence, then the trf command line and sequence would 
be a good test case for the TRF developers. 

To help determine why we're not finding the expected sequence name in the hash 
of chrom names to sizes, can you send the following?: 
* contents of your chr1.sizes and chr2.sizes files 
* header lines of the two fasta files passed to lastz/blastz 
* the first line that begins with "chain " in all.chain 

Angie 

----- "Bremen Braun" <[email protected]> wrote: 
> From: "Bremen Braun" <[email protected]> 
> To: "Angie Hinrichs" <[email protected]> 
> Cc: [email protected] 
> Sent: Tuesday, December 8, 2009 11:51:37 AM GMT -08:00 US/Canada Pacific 
> Subject: Re: [Genome] Generate conservation info for interspecies genome 
> comparisons 
> 
> Angie, 
> Thanks for the reply. I now have some questions regarding tool behavior. 
> 
> 1. Following Max's guide, I started by masking both fasta files to be 
> compared with trfBig using Tandem Repeats Finder v4.04. However, upon running 
> I get messages from stdout such as the following: 
> 
> Freeing Memory... 
> Resolving output... 
> Done.maybeSystem: system(cd .; trf ./ chr.tf 2 7 7 80 10 50 2000 -m ) failed 
> (exit status 1): Unknown error: 0 
> 
> and 
> 
> Freeing Memory... 
> Resolving output... 
> Done.maybeSystem: system(cd .; trf ./ chr2.tf 2 7 7 80 10 50 2000 -m ) failed 
> (exit status 1): Cannot allocate memory 
> 
> I am running OS X v10.6.2 on a quad core with 6 GB of RAM and using the 
> newest trfBig as far as I know. 
> 
> 2. chainPreNet generates the following error: 
> 
> Got 1 chroms in directory/chr1.sizes, 1 in directory/chr2.sizes 
> hashMustFindVal: 'chr2' not found 
> Finishing nets 
> writing stdout 
> writing /dev/null 
> Couldn't open /proc/self/stat , No such file or directory 
> hashMustFindVal: 'chr2' not found 
> Got 1 chroms in directory/chr1.sizes, 1 in directory/chr2.sizes 
> Finishing nets 
> writing stdout 
> writing /dev/null 
> Couldn't open /proc/self/stat , No such file or directory 
> 
> when running line: 
> 
> chainMergeSort *.chain > all.chain 
> chainPreNet all.chain chr1.sizes chr2.sizes stdout \ 
> | chainNet stdin -minSpace=1 chr1.sizes chr2.sizes stdout /dev/null \ 
> | netSyntenic stdin out.net 
> 
> All the tools I am using were compiled from the newest version of jksrc. Any 
> ideas what the problems are? 
> 
> Bremen 
> 
> 
> On Mon, Dec 7, 2009 at 11:57 PM, Angie Hinrichs < [email protected] > wrote: 
> 

Hi Bremen, 
> 
> You have the basic flow correct. Some details added below: 
> 
> 
> > 2. Align both chromosomes with blastz 
> 
> We now use lastz, an improved version of blastz 
> ( http://www.bx.psu.edu/miller_lab/ , search for lastz) but blastz is 
> sufficient. The output format of blastz, LAV, must be converted to 
> either PSL or AXT so that axtChain can read the alignments (we have 
> lavToAxt and lavToPsl programs; PSL is more compact). lastz can 
> produce axt directly with the --output=axt option. 
> 
> 
> 
> > 3. Chain using axtChain (QUESTION: I see this program takes two directories 
> > as arguments. If I wish only to compare two chromosomes, would these 
> > directories have only one file each?) 
> 
> Yes, that would work. However, you can also use the -faQ and -faT 
> options, and give fasta files instead of directories. To see a 
> description of all axtChain options, run "axtChain" with no arguments. 
> Here is an example usage of -faQ and -faT: 
> 
> axtChain -faQ -faT in.axt oneChrom.fa otherChrom.fa chroms.chain 
> 
> Another alternative is to convert your fasta files into the compact 
> format 2bit like this: 
> 
> faToTwoBit oneChrom.fa oneChrom.2bit 
> 
> -- then you can give axtChain [and lastz but not blastz] the .2bit file 
> instead of the directory, and don't have to pass -faQ / -faT. 
> 
> 
> 
> > 4. Get size of each chromosome for chain filtering using faSize 
> 
> Yes. Note that the sequence name and size must appear in a 
> tab-separated file, so use the -detailed flag like this: 
> 
> faSize -detailed oneChrom.fa > oneChrom.sizes 
> 
> 
> 
> > 5. Sort and filter chains 
> > a. use chainMergeSort using chain from step 3 
> > b. prenet with chainPreNet using chain from 5.a., size of target 
> > chromosome gotten from step 4, and size of query chromosome from 
> > step 4 as arguments 
> > 6. Netting? 
> 
> Yes. We pipe the output of chainPreNet to chainNet (and pipe that to 
> netSyntenic) like this: 
> 
> chainPreNet chroms.chain oneChrom.sizes otherChrom.sizes stdout \ 
> | chainNet stdin -minSpace=1 oneChrom.sizes otherChrom.sizes stdout /dev/null 
> \ 
> | netSyntenic stdin chroms.net 
> 
> 
> 
> > 7. Convert to .maf and use phyloFit 
> 
> Yes. For historical reasons we use netToAxt | axtToMaf, we don't have 
> a netToMaf. You can see an example of how phyloFit was run for the 
> D. melanogaster Conservation track in our source tree 
> (kent/src/hg/makeDb/doc/dm2.txt, search for "PHASTCONS 15WAY" and then 
> search for phyloFit). 
> 
> If you have only two species, there is not much phylogenetic 
> information for phyloFit, but I suppose it could still make a 
> substitution rate model. If you have more than two species, you will 
> also need multiz from http://www.bx.psu.edu/miller_lab/ . 
> 
> 
> 
> > 8. Run phastCons to get .wig output which can be uploaded as a conservation 
> > track 
> 
> Yes. Adam Siepel also has a new method of scoring conservation, 
> phyloP, which we offer in addition to phastCons scores on newer 
> Conservation tracks. I have no idea whether one would be more 
> appropriate than the other when working with only two species. 
> 
> 
> 
> > Since I am only comparing two chromosomes at a time, I think only a single 
> > chain is generated which doesn't have to be netted. Am I missing anything? 
> 
> Netting is still necessary, in order to get single-coverage alignments 
> on which conservation scores are computed. axtChain usually produces 
> many chains that cover the same position on the reference 
> genome/chromosome. The netting process selects the highest-scoring 
> chain as the top level, and then fills in gaps (unaligned areas) using 
> the next highest-scoring chain, and then fills in that chain's gaps 
> using the next highest-scoring chain and so on. It is kind of like 
> extracting a global alignment from a sea of chained local alignments. 
> 
> 
> Max Haeussler contributed a very useful genomewiki page about 
> reconstructing our alignment & conservation pipeline (perhaps you have 
> seen it already?): 
> 
> 
> http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto 
> 
> Hope that helps, and please send more questions to [email protected] 
> as you have them, 
> 
> Angie 
> 
> ----- "Bremen Braun" < [email protected] > wrote: 
> 
> > From: "Bremen Braun" < [email protected] > 
> > To: [email protected] 
> > Sent: Thursday, December 3, 2009 9:31:48 AM GMT -08:00 US/Canada Pacific 
> > Subject: [Genome] Generate conservation info for interspecies genome 
> > comparisons 
> 


> > 
> > Hello, 
> > 
> > I have sequences of similar chromosomes for different species that I 
> > want to 
> > compare. I would like to be able to generate a conservation track such 
> > as 
> > the one seen here: 
> > http://tinyurl.com/yz6o6fu 
> > 
> > I looked at an example of steps to be taken. Could you please 
> > verify/clarify 
> > for me? Let's assume I want to compare 2 chromosomes. 
> > 1. Mask both chromosomes 
> > 2. Align both chromosomes with blastz 
> > 3. Chain using axtChain (QUESTION: I see this program takes two 
> > directories 
> > as arguments. If I wish only to compare two chromosomes, would these 
> > directories have only one file each?) 
> > 4. Get size of each chromosome for chain filtering using faSize 
> > 5. Sort and filter chains 
> > a. use chainMergeSort using chain from step 3 
> > b. prenet with chainPreNet using chain from 5.a., size of target 
> > chromosome gotten from step 4, and size of query chromosome from step 
> > 4 as 
> > arguments 
> > 6. Netting? 
> > 7. Convert to .maf and use phyloFit 
> > 8. Run phastCons to get .wig output which can be uploaded as a 
> > conservation 
> > track 
> > 
> > Since I am only comparing two chromosomes at a time, I think only a 
> > single 
> > chain is generated which doesn't have to be netted. Am I missing 
> > anything? 
> > 
> > Thanks, 
> > Bremen 
> 


> > _______________________________________________ 
> > Genome maillist - [email protected] 
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome 
> 
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to