In some ways it's not until you have a multiple alignment that you  
have the strongest indication that it is contamination rather than  
just highly conserved.  This is less an issue for the non-mammals  
though where even "ultraconserved" is  going to be no more than 90%.

On Feb 6, 2010, at 6:58 AM, Adam Siepel wrote:

> Hi folks -- yes, this is an unfortunate problem, but I've always  
> resisted tackling it at the level of the phastCons tracks.  It  
> really would be best to filter these elements out of the  
> assemblies.  Barring this, I would suggest addressing it at the  
> level of the alignments, because it's not just phastCons that is  
> affected -- any method that makes use of patterns of conservation in  
> the multiple alignments is likely to be confused by these regions.
> Adam
>
> On Feb 5, 2010, at 12:49 PM, Jim Kent wrote:
>
>> I remember facing this issue of conservation via human  
>> contamination when we were first
>> doing comparative genomics when the mouse was sequenced.  It's one  
>> reason we didn't call
>> the ultraconserved regions at that point.  It wasn't until the rat  
>> sequence was available and
>> they were conserved there that we were convinced they weren't  
>> artifacts.  In the process we
>> did flag them in the mouse and get the assemblers to remove ones  
>> that where there was not
>> excellent evidence joining them to non-conserved regions in the  
>> mouse assembly.
>>
>> So, I am not surprised this is a problem.  The best solution is to  
>> get the xenopus and zebrafish
>> assemblies cleaned up. I'll cc this message to [email protected] 
>>  the help link for
>> Zebrafish, and to Dan Rhoksar who I know did some work at least in  
>> the past on Xenopus.
>> I'll also cc Adam Seipel the author of phastCons, and our own David  
>> Haussler to collect
>> their thoughts on the best way to proceed.
>>
>> Take care
>>      Jim
>>
>> On Feb 5, 2010, at 1:41 AM, Philippe Gautier wrote:
>>
>>> Hello,
>>> I'm new on this mailing list so, apologies if it's the wrong place  
>>> to
>>> ask the following question. Feel free to redirect me if needed!
>>> I'm working in a Bioinformatics service in our Unit and someone  
>>> asked me
>>> if they could get a list of most conserved elements in  
>>> vertebrates. I
>>> thought "easy, I just have to download the phastConsElements46way  
>>> table
>>> and take the highest score ones.
>>> I decided to check "manually" a few of them and was horrified to see
>>> that all (or most) seem to be artifacts due to human genomic DNA
>>> contaminant in other species.
>>> One example: the longest element:
>>> chr5:69686054-6970347 in GRch37, lod=14726, score=995.
>>> looks like it is conserved only in Xenopus and not other vertebrates
>>> (looking at the Multi Z alignment tracks). And when I realigned it  
>>> to
>>> the corresponding Xenopus genomic sequence (scaffold_7921:  
>>> 87-17248) it
>>> is virtually identical (>97% over 17Kb), undoubtedly a  
>>> contamination!
>>> Moreover, I looked at several other elements down the list and  
>>> almost
>>> all the top one (longest ones) are similar: not conserved in any
>>> vertebrate, except in Xenopus or Zebrafish. These pieces of DNA do
>>> contain LINE or LTR repeats so, are present in the human genome in
>>> multiple copies, but that does not explain such a high  
>>> conservation in
>>> frog or fish, and could only be explain by genome contaminations.
>>> Obviously, it is a problem at the assembly level, but I was also
>>> wondering if these elements should not be filtered out of the  
>>> phastCons
>>> element list?
>>>
>>> Philippe
>>>
>>> -- 
>>> Philippe Gautier
>>> Bioinformatics Service
>>> MRC - Human Genetics Unit
>>> Western General Hospital
>>> Crewe Road
>>> Edinburgh EH4 2XU
>>> U.K.
>>> tel: 0131 332 24 71
>>>
>>>
>>>
>>> _______________________________________________
>>> Genome maillist  -  [email protected]
>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to