Hello Michiel,

We do not remove all repeats, we only remove the lineage specific 
repeats. It is possible that if your RepeatMasker scripts are failing, 
that you have not been able to produce the actual lineage specific repeats.

For more info on the RepeatMasker scripts used to construct these files 
please see the associated makedoc.


Hopefully this information was helpful and answers your question. If you 
have further questions or require clarification feel free to contact the 
mailing list at [email protected].

Regards,

Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu


On 06/21/11 21:00, Michiel de Hoon wrote:
> Hello,
> 
> I am trying to do a multiple alignment of the genomes of several organisms. 
> To make sure I am doing this correctly, I tried to recreate the rheMac2 to 
> rn4 pairwise alignment that is available from the UCSC FTP server. I found 
> some discrepancies between my alignment and the UCSC alignment in the repeat 
> regions.
> 
>>From looking at src/hg/utils/automation/blastz-run-ucsc, I understand that 
>>repeats are removed from the Fasta genome files by strip_rpts before running 
>>lastz.
> In src/hg/makeDb/doc/rheMac2.txt, the repeats to be removed are determined by 
> running 
> DateRepeats chr*.fa.out -query human -comp mouse -comp dog
> 
> and then running extractRepeats 1 on the output. I couldn't find the 
> extractRepeats program, but I am guessing that I can get the appropriate 
> result by
> 
> DateRepeats chr*.fa.out -query human -comp mouse
> 
> Then I run selectRpts in blastz-run-ucsc on chr7.fa.out_mus-musculus to 
> generate the chr*.rpts file, which I then use with strip_rpts to generate the 
> stripped chromosome. I then run lastz on the stripped chromosomes.
> 
> However, when I look at the UCSC genome-wide alignment between rheMac2 and 
> rn4, it seems that the repeats that should have been removed are included in 
> the alignment.
> 
> As an example, one of the repeats removed by strip_rpts is the LINE/L2 repeat 
> at chr7:87564770..87565324 in rheMac2. But the first aligned chain in 
> rheMac2.rn4.all.chain.gz from UCSC starts with
> 
> chain 547645084 chr7 169801366 + 87564296 169142947 chr6 147636619 + 64051113 
> 13
> 8454126 1
> 17      2       0
> 57      0       1
> 19      0       1
> 7       7       0
> 139     6       0
> 15      1       0
> 12      7       0
> 4       3       0
> 11      0       1
> 57      1       0
> 52      0       1
> 23      6       0
> 61      0       1    <=== this block overlaps the LINE/L2 repeat
> 51      2       0
> 71      1       0
> ...
> 
> Now I understand that the lastz alignments shouldn't seed in a repeat, but 
> are allowed to extend into a repeat. But since we removed the repeat 
> sequences from the Fasta file altogether, how can this alignment extend into 
> a repeat?
> 
> Best wishes, and many thanks in advance,
> 
> Michiel de Hoon
> RIKEN Omics Science Center
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to