hi brooke,

thanks for the clarification.

just one question to make sure i fully understand the structure of the
refSeqAli.txt file: there are fields for describing the alignment blocks. is
it always the rule that each block is an exon? or do blocks simply denote
regions that align to the genome above the preselected threshold - and in
that case a certain exon may actually span several alignment blocks?

thanks,
nimrod



On Tue, Jun 15, 2010 at 3:35 AM, Brooke Rhead <[email protected]> wrote:

> Hi Nimrod,
>
> Ah, sorry for misunderstanding what you are trying to do!
> Unfortunately, the person here who has done the most work on the SNP
> tracks and who could best answer your questions is not available for the
> next several weeks, but we still may be able to point you in the right
> direction.
>
> I should clarify that the snp130CodingDbSnp table was built using
> annotations directly from dbSNP, so, while there is a description of how
> we built it (located in src/hg/makeDb/doc/hg18.txt in the Genome Browser
> source code), it is likely not what you are looking for.  We could point you
> to the portion of the code that is used to generate the "UCSC's predicted
> function relative to selected gene tracks" portion of the SNP details page,
> if you think that would be useful to you.
>
> One major change to your process that I can suggest is to start with the
> refSeqAli table rather than the refGene table to determine the mRNA
> coordinate.  The refGene table is a gene prediction table created from
> refSeqAli, and alignment information present in refSeqAli is lost in
> refGene.  The refSeqAli table is in psl format (
> http://genome.ucsc.edu/FAQ/FAQformat.html#format2), which retains all of
> the alignment information, and will enable you to go from a genomic
> coordinate to the correct mRNA coordinate.
>
>
> --
> Brooke Rhead
> UCSC Genome Bioinformatics Group
>
>
> On 06/12/10 01:25, nimrod rubinstein wrote:
>
>> thanks for the quick response,
>>
>> actually i am using snp130, but in my data i also have SNPs that do not
>> exist in snp130. i guess what i am trying to do (explained in my last
>> email)
>> is similar to what was performed  in order to build the snp130CodingDbSnp.
>> is there any description for that?
>>
>> thanks again,
>> nimrod
>>
>>
>>
>> On Sat, Jun 12, 2010 at 3:10 AM, Brooke Rhead <[email protected]> wrote:
>>
>>  Hi Nimrod,
>>>
>>> The snp130 table contains dbSNP's annotations on each SNP's predicted
>>> functional role (in the 'func' field), which includes whether the SNP is
>>> coding-synonymous, coding-nonsynonymous, in a 5' or 3' UTR, in an intron,
>>> just near a gene, etc.  (See the SNP 130 track description for a full
>>> list).
>>>  dbSNP uses RefSeq Genes to make these predictions.
>>>
>>> For determining the amino acid changes, I am happy to report that there
>>> is
>>> a somewhat new table in the hg18 database that already has the exact
>>> information you are looking to extract: snp130CodingDbSnp.
>>>
>>> This table is what the Genome Browser uses to display coding changes when
>>> you click on a SNP and look at the details page.  For instance, if you
>>> click
>>> on rs17852585 in the Genome Browser and scroll down, you will see:
>>>
>>> Coding annotations by dbSNP:
>>> NM_000808: missense L (CTC) --> P (CCC)
>>>
>>> (Note that you can also see predicted coding changes for *any* gene or
>>> gene
>>> prediction track by clicking "Go to SNPs (130) track controls" and making
>>> selections in the "On details page, show function and coding differences
>>> relative to..." boxes.  This information is not stored in any table -- it
>>> is
>>> generated on the fly when you click on a SNP.)
>>>
>>> I think that between the snp130 table and the snp130CodingDbSnp table,
>>> you
>>> should be able to find what you are looking for.  If you have any further
>>> questions, please feel free to write back to [email protected].  And
>>> thank you for searching the mailing list archives before asking your
>>> question!
>>>
>>> --
>>> Brooke Rhead
>>> UCSC Genome Bioinformatics Group
>>>
>>>
>>> On 06/11/10 05:40, nimrod rubinstein wrote:
>>>
>>>  hi,
>>>>
>>>> i have a list of SNPs and their locations on hg18. i'd like to
>>>> use ucsc data to find out for each SNP whether it falls in a
>>>> known gene and if so in which of the following regions:
>>>> 5'utr/coding sequence/intron/3'utr. if it does fall inside the
>>>> coding sequence i would additionally like to know whether
>>>> it is a synonymous SNP or not, and if not what is the resulting
>>>> amino acid
>>>>
>>>> i read through the mailing archives and understood its best to
>>>> use refGene
>>>> and refMrna for this task: for a given SNP coordinate i first
>>>> check whether it falls inside any of refGene's transcription
>>>> boundaries. if it does, i then determine in which region of the
>>>> gene. if it falls inside one of the coding exons i then extract
>>>> the relevant codon from refMrna - and here's where i'm stuck:
>>>>
>>>> according to the coordinates in refGene i might determine that
>>>> the SNP is
>>>> in e.g., the 5'utr but according to the coordinates in the CDS
>>>> file it may turn out that it's actually in the coding
>>>> sequence.and the other way around (plus other similar
>>>> combinations of that problem concerning the 3'utr and intron
>>>> regions).
>>>>
>>>> i understand that the genomic coordinates in refGene are the
>>>> result of BLAT and those in the CDS file are local coordinates
>>>> from NCBI. since the mapping of NCBI mRNAs to the genome is
>>>> imperfect these location discrepancies occur.
>>>>
>>>> so, if my description is correct is there any solution to my
>>>> problem? if i understood or am doing something wrong i would
>>>> greatly appreciate your corrections.
>>>> thank you very much for your time and help
>>>> Nimrod Rubinstein
>>>> The Department of Cell Research and Immunology
>>>> Tel Aviv University
>>>> _______________________________________________
>>>> Genome maillist  -  [email protected]
>>>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>>>
>>>>  _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>>
>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to