Re: [Genome] About SNP classification

Angie Hinrichs Mon, 13 Jul 2009 11:22:06 -0700

Hi Abhi,

You can download our source tree as a zip file or via CVS, following 
the instructions of step 6, part d at this link:
  http://genome.ucsc.edu/admin/mirror.html#step6
(Note the link to instructions in 6d, and the README files listed at 
the beginning of those instructions.)  You will need mysql client 
libraries installed locally in order to build the libs.


If you run pslMap with no command line arguments, it prints out a mini 
man page.  I think the usage will be like this:

pslMap -swapMap mySnpsAsPSL.psl myMrnas.psl snpsInMrnas.psl

where mySnpsAsPSL.psl has genomic coordinates as target coords and 
single-base queries (the SNPs), myMrnas.psl is the alignment of mRNA 
query sequences to the target genome, and the output file 
snpsInMrnas.psl has the single-base SNP queries aligned to mRNA 
targets.  

PSL is the native output format of the blat alignment tool, which is 
also in our source tree.  We align RefSeq and GenBank cDNA transcripts 
to the genome nightly and you can download PSL for those (ask if you 
need those, I assume you are already using your own mRNA alignments).  

The PSL format is documented here:

http://genome.ucsc.edu/FAQ/FAQformat#format2

If you happen to already have BED format for your SNPs, we have a 
couple programs bedToGenePred | genePredToFakePsl that can make 
mySnpsAsPSL.psl.  Otherwise, you'll probably need to write a line of 
perl to convert your SNP coordinate format to PSL.  

A note about many of the formats used by UCSC including BED and PSL: 
coordinates are 0-based, half-open.  A simple way to put it is 
"0-based starts, 1-based ends".  

Hope that helps and please let us know at [email protected] if you 
have more questions,

Angie


On Mon, 13 Jul 2009, Pratap, Abhishek wrote:


> Hello Angie
> 
> Thanks for a detailed explanation. I would be interested to check out
> pslMap and see how best I can use it for analysis. Could you send me the
> link to the same. Any existing documentation for the software will help
> too.
> 
> Best,
> -Abhi
> 
> -----Original Message-----
> From: Angie Hinrichs [mailto:[email protected]] 
> Sent: Wednesday, July 08, 2009 6:57 PM
> To: Pratap, Abhishek
> Cc: [email protected]
> Subject: Re: [Genome] About SNP classification
> 
> Hi Abhi,
> 
> At this point we do not have tools that do automated functional 
> annotation of SNPs.  We display dbSNP's functional annotations, and 
> our SNP details page can now annotate a single dbSNP item with respect 
> to a selected gene prediction track, but we don't have a program that 
> reads files of SNPs and gene (or mRNA) coords and writes out 
> annotations.  Such a tool (or at least components) might be available 
> elsewhere -- perhaps you could ask the Bio{Perl,Java,etc} community 
> about solutions too.
> 
> We do have a program pslMap that can help with #2 (mRNA transcript 
> offset from genomic coordinate), if you have PSL-format alignments of 
> the mRNA transcripts to the genome.  (Please email [email protected] 
> again if you would like help getting the program and/or PSL alignments 
> of RefSeq or GenBank mRNA transcripts.)  You will need to translate 
> your SNPs' genomic coords into PSL format (pretending some 1-base 
> sequence was aligned to the genome), and then pslMap can map those 
> fake genomic alignments through genomic-mRNA alignments into (fake) 
> mRNA alignments which will have the offsets that you need.  
> 
> Given the mRNA offset, and also the offset of the coding region in the 
> mRNA, it should be fairly straightforward to extract the codon that 
> contains the SNP mRNA offset, substitute and translate -- there are 
> many sequence-extraction and translation tools out there (and we have 
> our own, faFrag and faTrans).  
> 
> Hope that helps, and please write us back at [email protected] if 
> you have more questions.
> 
> Angie
> 
> 
> On Wed, 8 Jul 2009, Pratap, Abhishek wrote:
> 
> > Hi Brooke
> > 
> > Thanks for a quick revert. It is good to know that someone previously
> > had asked about this problem. Sorry if my initial mail was not clear,
> I
> > meant our own SNP predictions from Next Generation Sequencing Data.
> > 
> > So as an input we have hg18/NCBI36.2 bp location of SNP. We then want
> to
> > know whether a predicted SNP is syn/non-syn. Ideall we want to
> implement
> > this as an automated service. 
> > 
> > Indeed the thread explain the biology quite nicely how this could be
> > done. It is not very clear from the second part of the explanation
> which
> > talks about how this could be done in an automated fashion. I have
> > couple of questions about the basic workflow and how the UCSC genome
> > browser data could be used.
> > 
> > So when Jennifer talks about downloading the (mRNA and protein
> )sequence
> > and substituting the SNP base(s). The downstream analysis is more
> > oriented towards manual inspection of the results.
> > 
> > 1. Any specific method which could on the fly classify a particular
> SNP
> > position to be syn/non-syn.
> > 2. Also how to substitute the SNP bp position in a mRNA transcript. I
> > belv the coordinates will differ due to introns/intergenic regions.
> > 
> > Thanks for your patient reading. Your pointers in this direction shall
> > help us a great deal.
> > 
> > Cheers!
> > -Abhi
> > 
> > 
> > 
> > 
> > 
> > -----Original Message-----
> > From: Brooke Rhead [mailto:[email protected]] 
> > Sent: Tuesday, July 07, 2009 9:31 PM
> > To: Pratap, Abhishek
> > Cc: [email protected]
> > Subject: Re: [Genome] About SNP classification
> > 
> > Hello Abhishek,
> > 
> > Thank you for searching the mailing list archives!
> > 
> > Are you referring to your own SNP predictions, or the SNPs in the SNP 
> > track on the Genome Browser (from dbSNP)?
> > 
> > If you are referring to the SNP track, you can find information on a 
> > SNP's predicted functional role in the 'func' field of the SNP table 
> > (such as the snp129 table for the hg18 assembly).  Note that 
> > non-synonymous changes are denoted by words other than
> "non-synonymous".
> > 
> >   A full description of the possible classifications for the 'func' 
> > field is on the SNP track details page 
> > (http://genome.ucsc.edu/cgi-bin/hgTrackUi?g=snp129).  Here is an
> > excerpt:
> > 
> > # Coding - Synonymous - no change in peptide for allele with respect
> to 
> > reference assembly (coding-synon)
> > 
> > # Coding - Non-Synonymous - change in peptide for allele with respect
> to
> > 
> > reference assembly (nonsense, missense, frameshift)
> > 
> > These annotations are done by dbSnp, based on RefSeq Genes.
> > 
> > Indel information is in the 'class' field of the SNP table.  Again,
> more
> > 
> > details are available on the SNP track details page.  The locations of
> 
> > SNPs in genomic coordinates are available in the chrom, chromStart and
> 
> > chromEnd fields.  We do not have information on the position of SNPs 
> > within genes; however, there are other annotations in the 'func' field
> 
> > that describe a SNP's position in a gene, such as locus region,
> intron, 
> > untranslated, splice site, etc.
> > 
> > If you are instead referring to your own SNP predictions, perhaps this
> 
> > previous mailing list response is what you are looking for:
> > 
> > https://lists.soe.ucsc.edu/pipermail/genome/2009-May/018897.html
> > 
> > I hope this is helpful.  If you have further questions, please feel
> free
> > 
> > to email us again at [email protected].
> > 
> > --
> > Brooke Rhead
> > UCSC Genome Bioinformatics Group
> > 
> > 
> > On 07/07/09 08:45, Pratap, Abhishek wrote:
> > > Hi All
> > > 
> > >  
> > > 
> > > This might seem to be an old track question. However I was not able
> to
> > > find a good answer in the mailing list archives.
> > > 
> > >  
> > > 
> > > For all our SNP predictions we would like to know whether they are
> > > synonymous / non-synonymous. If Non-synonymous/Exonic  then find the
> > > position on the gene where amino acid is getting changed and to what
> > > ...Also info about indels will help.
> > > 
> > >  
> > > 
> > > It will be great if you could give me some pointers. 
> > > 
> > >  
> > > 
> > >  
> > > 
> > > ----------------------------- 
> > > Abhishek Pratap 
> > > Bioinformatics Software Engineer 
> > > Institute for Genome Sciences <http://www.igs.umaryland.edu/>  
> > > School of Medicine, Univ of Maryland 
> > > 801, W. Baltimore Street, Baltimore, MD 21209 
> > > Ph: (+1)-410-706-2296 
> > > 
> > > Chair 
> > > RSG-Worldwide <http://iscbsc.org/rsg>  
> > > ISCB-Student Council <http://iscbsc.org/>  
> > > 
> > > 
> > > 
> > > 
> > > 
> > >  
> > > 
> > >  
> > > 
> > >  
> > > 
> > > Thanks,
> > > 
> > > -Abhi
> > > 
> > > _______________________________________________
> > > Genome maillist  -  [email protected]
> > > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> > 
> > _______________________________________________
> > Genome maillist  -  [email protected]
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> > 
> 
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] About SNP classification

Reply via email to