Hello,

I am trying to do a multiple alignment of the genomes of several organisms. To 
make sure I am doing this correctly, I tried to recreate the rheMac2 to rn4 
pairwise alignment that is available from the UCSC FTP server. I found some 
discrepancies between my alignment and the UCSC alignment in the repeat regions.

>From looking at src/hg/utils/automation/blastz-run-ucsc, I understand that 
>repeats are removed from the Fasta genome files by strip_rpts before running 
>lastz.
In src/hg/makeDb/doc/rheMac2.txt, the repeats to be removed are determined by 
running 
DateRepeats chr*.fa.out -query human -comp mouse -comp dog

and then running extractRepeats 1 on the output. I couldn't find the 
extractRepeats program, but I am guessing that I can get the appropriate result 
by

DateRepeats chr*.fa.out -query human -comp mouse

Then I run selectRpts in blastz-run-ucsc on chr7.fa.out_mus-musculus to 
generate the chr*.rpts file, which I then use with strip_rpts to generate the 
stripped chromosome. I then run lastz on the stripped chromosomes.

However, when I look at the UCSC genome-wide alignment between rheMac2 and rn4, 
it seems that the repeats that should have been removed are included in the 
alignment.

As an example, one of the repeats removed by strip_rpts is the LINE/L2 repeat 
at chr7:87564770..87565324 in rheMac2. But the first aligned chain in 
rheMac2.rn4.all.chain.gz from UCSC starts with

chain 547645084 chr7 169801366 + 87564296 169142947 chr6 147636619 + 64051113 13
8454126 1
17      2       0
57      0       1
19      0       1
7       7       0
139     6       0
15      1       0
12      7       0
4       3       0
11      0       1
57      1       0
52      0       1
23      6       0
61      0       1    <=== this block overlaps the LINE/L2 repeat
51      2       0
71      1       0
...

Now I understand that the lastz alignments shouldn't seed in a repeat, but are 
allowed to extend into a repeat. But since we removed the repeat sequences from 
the Fasta file altogether, how can this alignment extend into a repeat?

Best wishes, and many thanks in advance,

Michiel de Hoon
RIKEN Omics Science Center
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to