Hi, I am a PhD student at University of New South Wales. I am working on Retroelement in Human Genome. So procedure I have followed is the following steps: 1. I have downloaded rmsk.txt.gz from the UCSC genome website. 2. Using the rmsk.sql definition, which gives me repeatName and their genomic location I've written program to extract the repeat nucleotide sequences from the 2009 human reference sequence (GRCh37).
When I've extracted the rmsk.txt.gz file to rmsk.txt (which is a large ~455 MB file), but the longest HERV sequence from this file is HERVS71-int of length 8909 bp. But according to some literature some of the HERV-K that was reported and the corresponding NCBI sequence (GenBank accession no. M14123<http://www.ncbi.nlm.nih.gov/nuccore/182227>) is 9109 bp. So my question is am I looking at the right ucsc file? Cause the above NCBI sequence is a Human Endogenous retrovirus sequence. So i assumed this should be automatically included in the rmsk file. Please advice me in this regards. Firoz ______________________________ Firoz Anwar Complex System in Biology Group Centre for Vascular Research (CVR) Lowy Cancer Research Building Level 4 University of New South Wales Email: [email protected]<mailto:[email protected]> Mobile # +61 0413185168 _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
