Re: [Genome] Download coding sequence bulk

Lipika Ray Thu, 09 Sep 2010 09:04:28 -0700

Hello Jennifer,

Thanks for your help - I was confused about 0-based counting - what to write
exactly about start and end - that's why it was not matching - your wiki
link on coordinate transforms helped in that part - thanks a lot - now I am
getting right sequence.
Thanks,


Lipika

On Wed, Sep 8, 2010 at 4:23 PM, Jennifer Jackson <[email protected]> wrote:

> Hello Lipika,
>
> Perhaps some help understanding the coordinate system used by UCSC will
> help. We use a 0-based start position. This can get tricky, especially when
> converting to the (-) strand, since we also store all coordinates
> smallest->largest along the chromosome.
>
> Help is located in this wiki:
> http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms
>
> All database tables/files will be formatted this way unless specifically
> noted in the data format FAQ:
> http://genome.ucsc.edu/FAQ/FAQformat.html
>
> There are utilities readily available that work with our coordinate system.
> Some function stand-alone and others require a database. The public mySQL
> database can be used when a database is required, if you do not run your own
> mirror.
>
> A list of utilities is here:
> http://hgwdev.cse.ucsc.edu/~larrym/utilities.html
>
> Many can be downloaded pre-compiled from here (for certain platforms):
> http://hgdownload.cse.ucsc.edu/admin/exe/
>
> Otherwise, obtain the source and compile locally:
> http://hgdownload.cse.ucsc.edu/downloads.html#source_downloads
>
> Public mySQL access instructions:
> http://genome.ucsc.edu/FAQ/FAQdownloads.html#download29
>
> Please feel free to contact the mailing list support team again if you
> would like more assistance.
>
> Warm regards,
>
> Jen
> UCSC Genome Browser Support
>
>
> On 9/8/10 11:35 AM, Lipika Ray wrote:
>
>>  Hello UCSC group,
>>
>> I like to get the coding sequence of gene from refseq mrna ids (like,
>> NM_003820) from hg18 version - big list of such ids.
>>
>> So I am getting information of exonstarts , exonends, cdsStart, cdsend
>> from
>> refFlat table under hg18.
>>
>> So for NM_003820, the record looks like this:
>>
>> geneName: TNFRSF14
>>       name: NM_003820
>>      chrom: chr1
>>     strand: -
>>    txStart: 2479150
>>      txEnd: 2486613
>>   cdsStart: 2479705
>>     cdsEnd: 2486314
>>  exonCount: 8
>> exonStarts:
>> 2479150,2480082,2481163,2482264,2483000,2484510,2485144,2486245,
>>   exonEnds:
>> 2479831,2480114,2481306,2482355,2483156,2484636,2485253,2486613,
>>
>> To get the dna sequence corresponding to the coding regions, I am
>> extracting
>> sequences from chr1.fa.gz file under chromosomes in hg18 version and then
>> extracting the dna sequence corresponding to the region:
>>
>> 2479705-2479831, 2480082-2480114, 2481163-2481306, 2482264-2482355,
>> 2483000-2483156, 2484510-2484636, 2485144-2485253, 2486245-2486314
>>
>> The corresponding sequence is not matching if I cross check with the
>> sequence from web. Can you please guide me whether I can extract sequence
>> in
>> this way, or you already have sequences corresponding to genes stored
>> separately in your datanbase.
>>
>> Thanks for your help.
>>
>> Lipika
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] Download coding sequence bulk

Reply via email to