Dear Sirs, I worked a bit with the RepeatMasker Track, but I found that, oddly, the consensus sequence of the transposable element (which we are investigating) used to mask the genome (and in general used by RepeatMasker) is different from that annotated in RepBase (and which we used for some preliminary analysis). I can download the hit list generated by BLAST using "our" consensus sequence, but I don't have the coordinates in the ordered horse genome for each BLAST hit, I have just the coordinates in the contig sequences; is there a way to submit this hit list (in csv or txt format, or whatever) to Table Browser and retrieve the coordinates in the horse genome for each hit?
Thanks, Marco Il 14/06/11 00:25, Greg Roe ha scritto: > Hi Marco, > > There is some information on the track info page for the RepeatMasker > track. Click the track title. There is also some info in the > downloads' README file: > http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/ (see esp. > chromOut.tar.gz) > > To set up the Table browser so it recovers only the elements with at > least 90% of identity with the consensus sequence.... > > (For some background, a definition of RepeatMasker output columns can > be found here: http://repeatmasker.org/webrepeatmaskerhelp.html ) > > The 2nd, 3rd and 4th columns of the .out files are useful: > > 15.6 = % substitutions in matching region compared to the consensus > 6.2 = % of bases opposite a gap in the query sequence (deleted bp) > 0.0 = % of bases opposite a gap in the repeat consensus > (inserted bp) > > In our database table, those are multiplied by 10 in order to get > integer parts-per-thousand, and called milliDiv (substitutions), > milliDel and milliIns. > > The simplest % identity measurement is milliDiv only -- if you wish, > you can factor in milliDel and milliIns too. > > So, to get % identity >= 90% in the Table Browser, create a filter > with milliDiv >= 900 (since it is parts per thousand). > > Please let us know if you have any additional questions: > [email protected] > > - > Greg Roe > UCSC Genome Bioinformatics Group > > > On 6/13/11 9:32 AM, Marco Santagostino wrote: >> Dear Sirs, >> >> were can I find the parameters used to generate the RepeatMasker track? >> The problem is as it follows: I need to take from the horse genome a >> certain repetitive element, and I'm supposed to classify all the hits >> found according to their identity (with respect to the consensus >> sequence). Some collegues of mine already took all the sequences with at >> least 98% of identity by BLAST search, so, now I'm supposed to find >> those which have a lower identity, but I can't find out how to set up >> the Table Browser so that it finds the elements with the identity that I >> chose. How do I set up the table browser so, for exemple, it recovers >> only the elements with at least 90% of identity with the consensus >> sequence? >> >> Thank you, >> >> Marco Santagostino >> >> >> > -- Marco Santagostino, PhD Lab. Molecular and Cellular Biology Dept. Genetics and Microbiology, University of Pavia Ferrata street, 1 - 27100 Pavia, Italy Tel.: +39 0382 985540 Fax: +39 0382 528496 e-mail: [email protected] _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
