Hi, I am having little problem interpreting the repeatmasking file which I have downloaded from the UCSC genome website. The "rmsk.txt.gz" file which I have downloaded from the this link <http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz> contains repeat information in Human genome. One of its field contains information about the "strand" and it says either positive or negative. My question is lets say for example one of the repeat name was L1MA3, which is a member of the LINE family and is located in Chromosome 1, position: 105002847 105010329 on the Negative strand. So the question is, because only the positive strand chromosome sequence is available does this position indicate 5' to 3' direction on positive strand of chromosome 1 OR do I need to generate the negative strand by complementing the positive strand and then calculate backward to that position?
Just to simplify what I mean: lets say the positive strand 5' to 3' sequence is ACCTTGGCTG, now in this sample sequence lets assume the position start from 1 and length is 10 so the last position is 10. If a repeat is located from position 1 to 3 on the positive strand then the repeat sequence is "ACC". But if the repeat sequence is position 1 to 3 on the negative strand does it mean the complement of last 3 letter CTG that is GAC but as it is on the negative strand its in the reverse direction so the sequence I am after should be CAG??? I know I might confuse you, but given that I have asked few people and apparently nobody seems to be expert with the UCSC repeatmasking annotation procedure. So it would be really appreciating if you take time to answer my query. Thanks a lot for your time and cooperation. Firoz ---------------------------------- Firoz Anwar Complex System in Biology Group Centre for Vascular Research(CVR) Lowy Cancer Research Centre University of New South Wales Australia _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
