Hi Hiram, Thank you for your prompt reply. I noticed that I neglected to do the axtSort step in my previous attempt and I redid it with the complete sequence. Now the diff output between my version and UCSC version is a lot smaller than before but I'm still seeing differences. Take chrY for example, the first difference occurred in the following entry:
UCSC: 6014 chrY 28725727 28727469 chr17 1974742 1976484 + 133306 Mine: 6014 chrY 28725727 28728984 chr17 1974742 1977914 + 241557 Just to be sure that we are using the same version of the data/programs, I list the exact places where I got them as follows: hg19.2bit: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ rheMac2.2bit: http://hgdownload.cse.ucsc.edu/gbdb/rheMac2/ hg19.rheMac2.net.gz: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsRheMac2/ hg19.rheMac2.all.chain.gz: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsRheMac2/ Also, I'm using the executable compiled by Jim Kent and distributed through his webpage: http://hgwdev.cse.ucsc.edu/~kent/exe/linux/ As well, I wrote up the following bash script to complete my procedures automatically: #!/bin/bash tDb=hg19 qDb=rheMac2 baseDir=~/bcm/data/genomes/$tDb.$qDb synNetDir=$baseDir/synNet chainDir=$baseDir/chain axtDir=$baseDir/axt tNibDir=~/bcm/data/genomes/$tDb/$tDb.2bit qNibDir=~/bcm/data/genomes/$qDb/$qDb.2bit if [ -e $synNetDir ] then echo "$synNetDir already exists" else netFilter -syn $baseDir/$tDb.$qDb.net.gz | netSplit stdin $synNetDir fi if [ -e $chainDir ] then echo "$chainDir already exists" else chainSplit $chainDir $baseDir/$tDb.$qDb.all.chain.gz fi cd $synNetDir if [ -e $axtDir ] then echo "$axtDir already exists" else mkdir $axtDir fi for f in *.net do echo $f if [[ $f =~ .+hap.+ ]] then echo "haplotype ignored (pseudochromosomes)" else inChain=$chainDir/${f/".net"/".chain"} outAxt=$axtDir/${f/".net"/".$tDb.$qDb.synNet.axt"} if [ -e $outAxt ] then echo "$outAxt already exists" else netToAxt $f $inChain $tNibDir $qNibDir stdout | axtSort stdin $outAxt fi fi done Thank you very much for your time in advance. Best, - Jia ________________________________________ From: Hiram Clawson [[email protected]] Sent: Thursday, November 10, 2011 7:01 PM To: Zeng, Jia Cc: [email protected] Subject: Re: [Genome] About generating syntenic net alignments Good Afternoon Jia: I tried this sequence of commands here to see if there is any change in our programs. I obtain the identical results that we delivered in 2009. I ran this csh script: #!/bin/csh -efx cd /tmp/synNet/axtChain # filter net for synteny and create syntenic net mafs netFilter -syn hg19.rheMac2.net.gz \ | netSplit stdin synNet chainSplit chain hg19.rheMac2.all.chain.gz cd /tmp/synNet mkdir axtSynNet foreach f (axtChain/synNet/*.net) netToAxt $f axtChain/chain/$f:t:r.chain \ /scratch/data/hg19/hg19.2bit /scratch/data/rheMac2/rheMac2.2bit stdout \ | axtSort stdin stdout | gzip -c > axtSynNet/$f:t:r.hg19.rheMac2.synNet.axt.gz end Using the files you find on hgdownload: -rw-rw-r-- 1 80508577 Nov 10 16:22 hg19.rheMac2.net.gz -rw-rw-r-- 1 54624334 Nov 10 16:23 hg19.rheMac2.all.chain.gz -rw-rw-r-- 1 816241703 Mar 8 2009 /scratch/data/hg19/hg19.2bit -rw-rw-r-- 1 746913825 Jan 30 2006 /scratch/data/rheMac2/rheMac2.2bit Resulting file is identical to the hgdownload version: -rw-rw-r-- 1 4677891 Nov 10 16:47 chrY.hg19.rheMac2.synNet.axt.gz Check your procedure script and source files to see if they match the example here. --Hiram ----- Original Message ----- From: "Jia Zeng" <[email protected]> To: [email protected] Sent: Thursday, November 10, 2011 3:22:40 PM Subject: [Genome] About generating syntenic net alignments Hi, I need to obtain the syntenic net alignments between several mammalian genomes and I decided to generate them myself. Given the information provided in the script doBlastzChainNet.pl (in particular the doSyntenicNet subroutine), I located the following script: # filter net for synteny and create syntenic net mafs netFilter -syn $tDb.$qDb.net.gz \\ | netSplit stdin synNet chainSplit chain $tDb.$qDb.all.chain.gz cd .. mkdir $successDir foreach f (axtChain/synNet/*.net) netToAxt \$f axtChain/chain/\$f:t:r.chain \\ $defVars{'SEQ1_DIR'} $defVars{'SEQ2_DIR'} stdout \\ | axtSort stdin stdout \\ | axtToMaf -tPrefix=$tDb. -qPrefix=$qDb. stdin \\ $defVars{SEQ1_LEN} $defVars{SEQ2_LEN} \\ stdout \\ | gzip -c > mafSynNet/\$f:t:r:r:r:r:r.maf.gz end rm -fr $runDir/synNet rm -fr $runDir/chain Since the $tDb.$qDb.net.gz, $tDb.$qDb.all.chain.gz files can be downloaded from the goldenPath directory of the download repository, I figured that I can easily complete the generation procedure following the script provided. (I do not need to represent the alignments in MAF form so I stopped at step netToAxt and wrote the output axt to chr?.$tDb.$qDb.synNet.axt, following UCSC's naming convention). In order to ensure that I have successfully generated the synNet.axt files, I downloaded the following file from UCSC website: chrY.hg19.rheMac2.synNet.axt.gz and compared it with the one generated by me. A diff indicated some differences in the two files for instance, in UCSC's version, such an entry exists which is absent in my file, though the entries before and after it are identical. 92 chrY 873749 873891 chr3 158314483 158314625 - 9481 It seems that my version has filtered out more entries than UCSC's version. I am just wondering whether you could kindly offer an input to this? Thank you very much. Best, - Jia _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
