Hello Jennifer, Thanks for your help - I was confused about 0-based counting - what to write exactly about start and end - that's why it was not matching - your wiki link on coordinate transforms helped in that part - thanks a lot - now I am getting right sequence. Thanks,
Lipika On Wed, Sep 8, 2010 at 4:23 PM, Jennifer Jackson <[email protected]> wrote: > Hello Lipika, > > Perhaps some help understanding the coordinate system used by UCSC will > help. We use a 0-based start position. This can get tricky, especially when > converting to the (-) strand, since we also store all coordinates > smallest->largest along the chromosome. > > Help is located in this wiki: > http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms > > All database tables/files will be formatted this way unless specifically > noted in the data format FAQ: > http://genome.ucsc.edu/FAQ/FAQformat.html > > There are utilities readily available that work with our coordinate system. > Some function stand-alone and others require a database. The public mySQL > database can be used when a database is required, if you do not run your own > mirror. > > A list of utilities is here: > http://hgwdev.cse.ucsc.edu/~larrym/utilities.html > > Many can be downloaded pre-compiled from here (for certain platforms): > http://hgdownload.cse.ucsc.edu/admin/exe/ > > Otherwise, obtain the source and compile locally: > http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads > > Public mySQL access instructions: > http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29 > > Please feel free to contact the mailing list support team again if you > would like more assistance. > > Warm regards, > > Jen > UCSC Genome Browser Support > > > On 9/8/10 11:35 AM, Lipika Ray wrote: > >> Hello UCSC group, >> >> I like to get the coding sequence of gene from refseq mrna ids (like, >> NM_003820) from hg18 version - big list of such ids. >> >> So I am getting information of exonstarts , exonends, cdsStart, cdsend >> from >> refFlat table under hg18. >> >> So for NM_003820, the record looks like this: >> >> geneName: TNFRSF14 >> name: NM_003820 >> chrom: chr1 >> strand: - >> txStart: 2479150 >> txEnd: 2486613 >> cdsStart: 2479705 >> cdsEnd: 2486314 >> exonCount: 8 >> exonStarts: >> 2479150,2480082,2481163,2482264,2483000,2484510,2485144,2486245, >> exonEnds: >> 2479831,2480114,2481306,2482355,2483156,2484636,2485253,2486613, >> >> To get the dna sequence corresponding to the coding regions, I am >> extracting >> sequences from chr1.fa.gz file under chromosomes in hg18 version and then >> extracting the dna sequence corresponding to the region: >> >> 2479705-2479831, 2480082-2480114, 2481163-2481306, 2482264-2482355, >> 2483000-2483156, 2484510-2484636, 2485144-2485253, 2486245-2486314 >> >> The corresponding sequence is not matching if I cross check with the >> sequence from web. Can you please guide me whether I can extract sequence >> in >> this way, or you already have sequences corresponding to genes stored >> separately in your datanbase. >> >> Thanks for your help. >> >> Lipika >> _______________________________________________ >> Genome maillist - [email protected] >> https://lists.soe.ucsc.edu/mailman/listinfo/genome >> > _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
