Hello,
I'm new on this mailing list so, apologies if it's the wrong place to 
ask the following question. Feel free to redirect me if needed!
I'm working in a Bioinformatics service in our Unit and someone asked me 
if they could get a list of most conserved elements in vertebrates. I 
thought "easy, I just have to download the phastConsElements46way table 
and take the highest score ones.
I decided to check "manually" a few of them and was horrified to see 
that all (or most) seem to be artifacts due to human genomic DNA 
contaminant in other species.
One example: the longest element:
chr5:69686054-6970347 in GRch37, lod=14726, score=995.
looks like it is conserved only in Xenopus and not other vertebrates 
(looking at the Multi Z alignment tracks). And when I realigned it to 
the corresponding Xenopus genomic sequence (scaffold_7921: 87-17248) it 
is virtually identical (>97% over 17Kb), undoubtedly a contamination!
Moreover, I looked at several other elements down the list and almost 
all the top one (longest ones) are similar: not conserved in any 
vertebrate, except in Xenopus or Zebrafish. These pieces of DNA do 
contain LINE or LTR repeats so, are present in the human genome in 
multiple copies, but that does not explain such a high conservation in 
frog or fish, and could only be explain by genome contaminations.
Obviously, it is a problem at the assembly level, but I was also 
wondering if these elements should not be filtered out of the phastCons 
element list?

Philippe

-- 
Philippe Gautier
Bioinformatics Service
MRC - Human Genetics Unit
Western General Hospital
Crewe Road
Edinburgh EH4 2XU
U.K.
tel: 0131 332 24 71



_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to