Re: [Genome] Question about the RepeatMasker track

Mary Goldman Wed, 13 Jul 2011 14:29:21 -0700

Hi Marco,

It looks like your contig names are from NCBI and, unfortunately, we do 
not have a way to convert those names into our assembled chromosome name 
space. If you search on NCBI for your contig name, it will give you the 
scaffold name, which you can then use with our scaffold track/table to 
convert into assembled chromosome name space.


Best,
Mary
------------------
Mary Goldman
UCSC Bioinformatics Group

On 7/7/11 1:43 AM, Marco Santagostino wrote:
> Hello,
>
> I think that contig names are different, here is an exemple of the 
> BLAST results found:
>
> gi|194214692|ref|NW_001867430.1|Eca8_WGA28_2 Equus caballus chromosome 
> 8 genomic contig, reference assembly (based on EquCab2)
>
> from the hit table:
> subject ids, % identity, alignment length, mismatches, gap opens, q. 
> start, q. end, s. start, s. end, evalue, bit score
> gi|194214692|ref|NW_001867430.1|Eca8_WGA28_2    98.25    228    4    
> 0    1    228    835135    835362    2e-109     399
>
> I suppose the contig name is NW_001867430.1 , which is also the 
> accession number, but seems it doesn't match those found in 
> GenomeBrowser, this sequence should match this locus (found with BLAT):
>
>     ACTIONS      QUERY           SCORE START  END QSIZE IDENTITY CHRO STRAND  
> START    END      SPAN
> ---------------------------------------------------------------------------------------------------
> browser  
> <http://genome.ucsc.edu/cgi-bin/hgTracks?position=chr8:30966059-30968286&db=equCab2&ss=../trash/hgSs/hgSs_genome_7264_570d30.pslx+../trash/hgSs/hgSs_genome_7264_570d30.fa&hgsid=201920499>
>   details  
> <http://genome.ucsc.edu/cgi-bin/hgc?o=30966058&g=htcUserAli&i=../trash/hgSs/hgSs_genome_7264_570d30.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_7264_570d30.fa+YourSeq&c=chr8&l=30966058&r=30968286&db=equCab2&hgsid=201920499>
>   YourSeq         2228     1  2228  2228 100.0%     8   +   30966059  
> 30968286   2228
>
> which should be placed in "contig_21911" according to GenomeBrowser (I 
> do not know the exact position).
>
>
> All the best,
>
> Marco
>
>
> Il 07/07/11 01:31, Mary Goldman ha scritto:
>> Hi Marco,
>>
>> We are unsure if your contig names are from the Broad Institute (the 
>> organization who performed the sequencing) or NCBI. Can you please 
>> check the Assembly track and see if your contig names match the ones 
>> in this track (here is a link for equCab2: 
>> http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=equCab2&g=gold)? If they 
>> do, you can download the data in this track to convert your 
>> coordinates. If not, please send us an example of your contig names 
>> and we can see if we have a conversion file.
>>
>> Best,
>> Mary
>> ------------------
>> Mary Goldman
>> UCSC Bioinformatics Group
>>
>> On 7/5/11 10:09 AM, Marco Santagostino wrote:
>>> Dear Sirs,
>>>
>>> I worked a bit with the RepeatMasker Track, but I found that, oddly, 
>>> the
>>> consensus sequence of the transposable element (which we are
>>> investigating) used to mask the genome (and in general used by
>>> RepeatMasker) is different from that annotated in RepBase (and which we
>>> used for some preliminary analysis). I can download the hit list
>>> generated by BLAST using "our" consensus sequence, but I don't have the
>>> coordinates in the ordered horse genome for each BLAST hit, I have just
>>> the coordinates in the contig sequences; is there a way to submit this
>>> hit list (in csv or txt format, or whatever) to Table Browser and
>>> retrieve the coordinates in the horse genome for each hit?
>>>
>>> Thanks,
>>>
>>> Marco
>>>
>>>
>>>
>>> Il 14/06/11 00:25, Greg Roe ha scritto:
>>>> Hi Marco,
>>>>
>>>> There is some information on the track info page for the RepeatMasker
>>>> track. Click the track title. There is also some info in the
>>>> downloads' README file:
>>>> http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ (see esp.
>>>> chromOut.tar.gz)
>>>>
>>>> To set up the Table browser so it recovers only the elements with at
>>>> least 90% of identity with the consensus sequence....
>>>>
>>>> (For some background, a definition of RepeatMasker output columns can
>>>> be found here: http://repeatmasker.org/webrepeatmaskerhelp.html )
>>>>
>>>> The 2nd, 3rd and 4th columns of the .out files are useful:
>>>>
>>>>    15.6    = % substitutions in matching region compared to the 
>>>> consensus
>>>>    6.2     = % of bases opposite a gap in the query sequence 
>>>> (deleted bp)
>>>>    0.0     = % of bases opposite a gap in the repeat consensus
>>>> (inserted bp)
>>>>
>>>> In our database table, those are multiplied by 10 in order to get
>>>> integer parts-per-thousand, and called milliDiv (substitutions),
>>>> milliDel and milliIns.
>>>>
>>>> The simplest % identity measurement is milliDiv only -- if you wish,
>>>> you can factor in milliDel and milliIns too.
>>>>
>>>> So, to get % identity>= 90% in the Table Browser, create a filter
>>>> with milliDiv>= 900 (since it is parts per thousand).
>>>>
>>>> Please let us know if you have any additional questions:
>>>> [email protected]
>>>>
>>>> -
>>>> Greg Roe
>>>> UCSC Genome Bioinformatics Group
>>>>
>>>>
>>>> On 6/13/11 9:32 AM, Marco Santagostino wrote:
>>>>> Dear Sirs,
>>>>>
>>>>> were can I find the parameters used to generate the RepeatMasker 
>>>>> track?
>>>>> The problem is as it follows: I need to take from the horse genome a
>>>>> certain repetitive element, and I'm supposed to classify all the hits
>>>>> found according to their identity (with respect to the consensus
>>>>> sequence). Some collegues of mine already took all the sequences 
>>>>> with at
>>>>> least 98% of identity by BLAST search, so, now I'm supposed to find
>>>>> those which have a lower identity, but I can't find out how to set up
>>>>> the Table Browser so that it finds the elements with the identity 
>>>>> that I
>>>>> chose. How do I set up the table browser so, for exemple, it recovers
>>>>> only the elements with at least 90% of identity with the consensus
>>>>> sequence?
>>>>>
>>>>> Thank you,
>>>>>
>>>>> Marco Santagostino
>>>>>
>>>>>
>>>>>
>>>
>>
>
>
> -- 
> Marco Santagostino, PhD
> Lab. Molecular and Cellular Biology
> Dept. Genetics and Microbiology, University of Pavia
> Ferrata street, 1 - 27100 Pavia, Italy
> Tel.:    +39 0382 985540
> Fax:     +39 0382 528496
> e-mail:[email protected]
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Question about the RepeatMasker track

Reply via email to