Hi Firoz,

It could be that the specific HERV you're looking for is not in the 
library used by RepeatMasker. If you need the consensus sequences, 
register with the provider of the libraries at 
http://girinst.org/accountservices/register.php and then grab the 
RepeatMasker edition (and/or other formats) from 
http://girinst.org/server/RepBase/index.php.

If you want the locations of a particular HERV, BLAT can quickly find 
locations with high percent identity. This page has the FASTA sequence 
for your HERV-K example: 
http://www.ncbi.nlm.nih.gov/nuccore/M14123?report=fasta&log$=seqview&format=text.
 
If you copy and paste the sequence into BLAT 
(http://genome.ucsc.edu/cgi-bin/hgBlat), it will find quite a few 
full-length matches.

I hope this information is helpful.  Please feel free to contact the 
mail list again if you require further assistance.

Best,
Mary
------------------
Mary Goldman
UCSC Bioinformatics Group

On 6/28/11 12:12 AM, Firoz Anwar wrote:
> Hi,
> I am a PhD student at University of New South Wales. I am working on 
> Retroelement in Human Genome. So procedure I have followed is the following 
> steps:
> 1. I have downloaded rmsk.txt.gz from the UCSC genome website.
> 2. Using the rmsk.sql definition, which gives me repeatName and their genomic 
> location I've written program to extract the repeat nucleotide sequences from 
> the 2009 human reference sequence (GRCh37).
>
> When I've extracted the rmsk.txt.gz file to rmsk.txt (which is a large ~455 
> MB file), but the longest HERV sequence from this file is HERVS71-int of 
> length
> 8909 bp.
>
> But according to some literature some of the HERV-K that was reported and the 
> corresponding NCBI sequence (GenBank accession no. 
> M14123<http://www.ncbi.nlm.nih.gov/nuccore/182227>) is 9109 bp.
>
> So my question is am I looking at the right ucsc file?  Cause the above NCBI 
> sequence is a Human Endogenous retrovirus sequence. So i assumed this should 
> be automatically included in the rmsk file.
>
> Please advice me in this regards.
>
> Firoz
> ______________________________
> Firoz Anwar
> Complex System in Biology Group
> Centre for Vascular Research (CVR)
> Lowy Cancer Research Building
> Level 4
>
> University of New South Wales
> Email: [email protected]<mailto:[email protected]>
> Mobile # +61 0413185168
>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to