Hello, Tony.
One of our engineers had this to say:
Question 1:
There is only one self blastz chain for panTro2. It was done over 6 years
ago with older versions of blastz and different blastz options than compared
with, for example, hg19 which used a more recent version of lastz.
panTro2 work in: /hive/data/genomes/panTro2/bed/blastzSelf.2006-03-22
hg19 work in: /hive/data/genomes/hg19/bed/lastzSelf.2009-03-19
The options used are quite different and would certainly result in a
different type of coverage measurement. panTro2 blastz arguments were
minimal, meaning that most were defaults:
BLASTZ=blastz.v7.x86_64
BLASTZ_H=2000
BLASTZ_M=200
axtChain arguments: -minScore=10000 -linearGap=medium
hg19 lastz arguments are more specific:
BLASTZ=lastz
BLASTZ_M=254
BLASTZ_Q=human_chimp.v2.q
BLASTZ_O=600
BLASTZ_E=150
BLASTZ_K=10000
BLASTZ_Y=15000
BLASTZ_T=2
where the human_chimp.v2.q matrix is:
A C G T
90 -330 -236 -356
-330 100 -318 -236
-236 -318 100 -330
-356 -236 -330 90
axtChain arguments: -scoreScheme=human_chimp.v2.q -minScore=2000
-linearGap=medium
Question 2:
No, masking in the 46-way genomes are sometimes RepeatMasker with TRF, and
sometimes WindowMasker with TRF. Even though this table indicates masking:
http://genomewiki.ucsc.edu/index.php/Hg19_conservation_alignment
it does not indicate which masking was used in the actual genome sequence.
The individual make documents would need to be examined to find out which of
RepeatMasker or WindowMasker is used in the masked sequence.
Please contact us again at [email protected] if you have any further
questions.
---
Steve Heitner
UCSC Genome Bioinformatics Group
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Tony Capra
Sent: Monday, July 23, 2012 11:43 PM
To: [email protected]
Subject: [Genome] hg19 vs. pt2 self chains
Hi,
I have a few questions about human and chimp self chains. There is no
chainSelf for pt2 or pt3, but I noticed that a chainSelf track is listed in
the table browser on the genome browser test site. However, when I select
this from the menu, I get the error "Could not find table info for table
chainSelf in db panTro2." Is this a mistake or is there no self chain
available for chimp?
I did find a previous post that discusses pt2 self chains:
https://lists.soe.ucsc.edu/pipermail/genome/2007-December/015138.html
. I downloaded the chained chimp alignments from the vsSelf directory on the
test site (http://hgdownload-test.cse.ucsc.edu/goldenPath/panTro2/vsSelf/
). Strangely, there are 50 times more chains for chimp than for hg19, and
indeed, if I filter and process the pt2 chains into a bed file, they cover
32.57% of the genome. This seems quite high compared to human; running
bedCoverage on the hg19 chainSelf track gives a genome- wide coverage of
19.9%. I am able to replicate the hg19 chainSelf track from the human
chains, so I don't think this is a problem in the way I'm processing the
chains. In addition, running the same procedure on the rheMac2 self
alignment chains, which are similar in number to human, produces a self
chain that covers 14% of the rheMac2 genome.
In short, I have two questions: (1) Is there a pre-computed chimp self chain
track? (2) If not, what's the best strategy to make one? The available
panTro2 self alignments seem very different than those that are available
for hg19 and rheMac2. Do you have a sense of why the
panTro2 alignments have so many more chains? I want to make sure that the
increased coverage of the genome by the self chain I computed is not an
artifact of something about the way the chains were constructed. According
to the comments in the chain files, the matrices used seem to be somewhat
different, but I don't have a great sense of how this influences the
results.
Finally, I have one other unrelated question. In the multiz46way
alignments, how consistent is the meaning of the lowercase masking?
Have RepeatMasker and Tandem Repeats Finder been run on all included
genomes?
Thanks,
Tony
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist - [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome