Hi,

I downloaded the hg19 refFlat.txt.gz file from UCSC 
(ftp://genome-ftp.cse.ucsc.edu/goldenPath/hg18/database/refFlat.txt.gz). It has 
multiple lines for a single gene. For instance, the command:

   % grep -e DEFB106A -e DEFB106B refFlat.txt

generates the four lines:

DEFB106A        NM_152251       chr8    -       7340025 7343909 7340125 7343904 
2       7340025,7343855,        7340274,7343909,
DEFB106A        NM_152251       chr8    +       7682693 7686575 7682698 7686475 
2       7682693,7686326,        7682747,7686575,
DEFB106B        NM_001040704    chr8    -       7340025 7343909 7340125 7343904 
2       7340025,7343855,        7340274,7343909,
DEFB106B        NM_001040704    chr8    +       7682693 7686575 7682698 7686475 
2       7682693,7686326,        7682747,7686575,

In other words, DEFB106A and DEFB106B have exactly the same annotation. I 
realize this is because of duplicate regions, but the co-ordinates at NCBI are:
 * DEFB106A as Chromosome 8 (7682694..7686575)
 * DEFB106B as Chromosome 8 (7340026..7343909, complement)

Is there any way of getting the refFlat file with only the NCBI version of the 
gene co-ordinates (or as close to those coordinates as possible) ?

Thanks,
Vamsi

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to