Re: [Genome] help: Exonic position map to Protein position

janeela khan Mon, 04 Apr 2011 15:43:01 -0700

Thank you so very much. It was very useful information for me. I wonder if I 
can also retrieve the exonic sequence for the pig genome?> From: 
[email protected]
> Subject: Genome Digest, Vol 99, Issue 3
> To: [email protected]
> Date: Fri, 1 Apr 2011 12:00:12 -0700
> 
> Send Genome mailing list submissions to
>       [email protected]
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>       https://lists.soe.ucsc.edu/mailman/listinfo/genome
> or, via email, send a message with subject or body 'help' to
>       [email protected]
> 
> You can reach the person managing the list at
>       [email protected]
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Genome digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: help: Exonic position map to Protein position
>       (Vanessa Kirkup Swing)
>    2. Re: How to generate mapping between Ensembl and refseq
>       transcript IDs (Hiram Clawson)
>    3. Re: bedgraph data will not display points (Hiram Clawson)
>    4. Re: when is a query excessive. (Galt Barber)
>    5. Re: bedgraph data will not display points
>       (Lionel (Lee) Brooks 3rd)
>    6. Re: bedgraph data will not display points (Hiram Clawson)
>    7. protein families (Tom Traut)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Fri, 1 Apr 2011 09:38:36 -0700 (PDT)
> From: Vanessa Kirkup Swing <[email protected]>
> Subject: Re: [Genome] help: Exonic position map to Protein position
> To: janeela khan <[email protected]>
> Cc: UCSC genome <[email protected]>
> Message-ID:
>       <[email protected]>
> Content-Type: text/plain; charset=utf-8
> 
> Hi Janeela,
> 
> To figure out where in the exon the protein is translated from, you will need 
> to use the table browser. To get to the table browser click on on "Tables" 
> from the blue navigation bar.
> 
> Set the clade, genome, and assembly.
> 
> Then you will need to set the following:
> 
> group: Gene and Gene prediction tracks
> track: UCSC Genes
> table: knownGene
> region: genome
> identifiers (names/accessions): click on "paste list" and paste in the 
> identifiers following the instructions.
> output format: selected fields from primary and related tables
> 
> click "get output"
> 
> select the fields you want displayed.
> 
> click "get output"
> 
> Hope this helps lead you in the right direction. If you have further 
> questions, please contact us at [email protected]
> 
> Vanessa Kirkup Swing
> UCSC Genome Bioinformatics Group
> 
> ----- Original Message -----
> From: "janeela khan" <[email protected]>
> To: "UCSC genome" <[email protected]>
> Sent: Thursday, March 31, 2011 8:21:24 AM
> Subject: [Genome] help: Exonic position map to Protein position
> 
> 
> 
> Dear All,
> Could you guide me how I can map certain Positions in an exon to the Protein 
> positions? Here i donot have the exact genomic positions but I have the gene 
> name and the relative position in an exon. Is there a way to map this 
> position to protein?
> Thanks for the help in advance
> MvH/janeela
> 
>                                         
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Fri, 01 Apr 2011 09:41:00 -0700
> From: Hiram Clawson <[email protected]>
> Subject: Re: [Genome] How to generate mapping between Ensembl and
>       refseq  transcript IDs
> To: "Cook, Malcolm" <[email protected]>
> Cc: "'Rajasimha, Harsha \(NIH/NEI\) \[C\]'" <[email protected]>,
>       "'[email protected]'" <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Sorry Malcolm, there isn't a generic method for all genomes at UCSC.
> This is a most interesting example you have here.  Usually chrM at
> Ensembl is: "Mt"
> 
> Newer genome assemblies at UCSC are including two tables:
> ensemblLift
> ucscToEnsembl
> 
> Which allow translation of UCSC names to Ensembl names and
> coordinate conversions for haplotypes and other random bits that
> might be located in a different coordinate system.  For example:
> 
> $ hgsql -e "select * from ensemblLift;" hg19
> +-----------------+----------+
> | chrom           | offset   |
> +-----------------+----------+
> | HSCHR4_1        | 69170076 |
> | HSCHR17_1       | 43384863 |
> | HSCHR6_MHC_APD  | 28696603 |
> | HSCHR6_MHC_COX  | 28477796 |
> | HSCHR6_MHC_DBB  | 28696603 |
> | HSCHR6_MHC_MANN | 28696603 |
> | HSCHR6_MHC_MCF  | 28696603 |
> | HSCHR6_MHC_QBL  | 28696603 |
> | HSCHR6_MHC_SSTO | 28659142 |
> +-----------------+----------+
> 
> $ hgsql -e "select * from ucscToEnsembl;" hg19 | grep MHC
> chr6_ssto_hap7  HSCHR6_MHC_SSTO
> chr6_qbl_hap6   HSCHR6_MHC_QBL
> chr6_mcf_hap5   HSCHR6_MHC_MCF
> chr6_mann_hap4  HSCHR6_MHC_MANN
> chr6_cox_hap2   HSCHR6_MHC_COX
> chr6_dbb_hap3   HSCHR6_MHC_DBB
> chr6_apd_hap1   HSCHR6_MHC_APD
> 
> It would be a useful process to go back over some of the older popular
> genomes to add these conversion tables.
> 
> --Hiram
> 
> Cook, Malcolm wrote:
> > Hiram,
> > 
> > Is there a similar approach for chromosomal identifiers?  (i.e. chrM in dm3 
> > is dmel_mitochondrion_genome at ensemble)
> > 
> > Or better, an SQL query for same?
> > 
> > Thx
> > 
> > Malcolm Cook
> > Stowers Institute for Medical Research -  Bioinformatics
> > Kansas City, Missouri  USA
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Fri, 01 Apr 2011 09:49:16 -0700
> From: Hiram Clawson <[email protected]>
> Subject: Re: [Genome] bedgraph data will not display points
> To: Lionel Brooks <[email protected]>
> Cc: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Good Morning Lionel:
> 
> The bedGraph drawing mechanism can construct bar graphs at your
> specified intervals, or when you select graphType=points it will
> draw only the top of the bar graph at your specified intervals.
> There is no line drawing except by the trick of "smoothing" points
> such that they appear to be in a line graph.  This only works if
> the data points are continuous when seen in the genome browser.
> Smoothing will not smear points into areas where there is no
> data value specified.
> 
> The Genome Graphs function of the genome browser:
>       http://genome.ucsc.edu/cgi-bin/hgGenome
> will only draw lines between your specified points.
> 
> See also:
> 
> http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
> 
> --Hiram
> 
> Lionel Brooks wrote:
> > Hello all,
> > 
> > I have a bedgraph file.  In the past I have used to files to attain 
> > graphic output in the form of a smoothed line but I uploaded my most 
> > recent data set and now I cannot get a line graph.  In fact, I'm not 
> > sure what I am looking at because the values that are displayed along 
> > the y-axis are not described with a label. 
> > 
> > Here is my track line:
> > track type=bedGraph autoScale=on graphType=points windowingFunction=mean 
> > smoothingWindow=16
> > 
> > My data format is
> > 
> > chr   coordA   coordB   value
> > 
> > Where approximate distribution of data values are : 5 <= value <= 500.
> > Is it possible that your plotting function cannot compute this line 
> > because my coordinate intervals are too small?
> > Another possibly relevant issue may be that the coordinate intervals are 
> > not fixed length.
> > 
> > Any suggestions for course of action would be great.
> > 
> > Sincerely,
> > Lionel
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Fri, 01 Apr 2011 10:25:14 -0700
> From: Galt Barber <[email protected]>
> Subject: Re: [Genome] when is a query excessive.
> To: John Hayward <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> 
> Hi, John!
> 
> Queries that take more than a few minutes to run are
> probably inappropriate for the shared public mysql server.
> 
> I found this query formulation for you that takes less than one minute:
> 
> select name, observed, count(*) from(
> (select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16') 
> union
> (select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16') 
> union
> (select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16') 
> union
> (select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16')
> ) resultAlias group by name, observed having count(*) = 4 limit 30;
> 
> +-----------+----------+----------+
> | name      | observed | count(*) |
> +-----------+----------+----------+
> | rs1000014 | A/G      |        4 |
> | rs1000047 | C/T      |        4 |
> | rs1000077 | C/G      |        4 |
> | rs1000078 | A/G      |        4 |
> | rs1000100 | A/T      |        4 |
> | rs1000174 | A/G      |        4 |
> | rs1000178 | C/T      |        4 |
> | rs1000192 | A/G      |        4 |
> | rs1000193 | A/C      |        4 |
> | rs1000454 | C/G      |        4 |
> | rs1000455 | A/T      |        4 |
> | rs1000710 | G/T      |        4 |
> | rs1000711 | C/G      |        4 |
> | rs1000720 | A/G      |        4 |
> | rs1000742 | C/T      |        4 |
> | rs1001157 | A/G      |        4 |
> | rs1001170 | G/T      |        4 |
> | rs1001171 | A/T      |        4 |
> | rs1001302 | A/G      |        4 |
> | rs1001362 | C/T      |        4 |
> | rs1001366 | C/T      |        4 |
> | rs1001493 | C/T      |        4 |
> | rs1001554 | A/G      |        4 |
> | rs1001608 | C/T      |        4 |
> | rs1001631 | C/G      |        4 |
> | rs1001655 | A/G      |        4 |
> | rs1001722 | G/T      |        4 |
> | rs1001776 | C/T      |        4 |
> | rs1001861 | A/G      |        4 |
> | rs1001871 | C/G      |        4 |
> +-----------+----------+----------+
> 30 rows in set (46.17 sec)
> 
> Of course for your own full output,
> you would remove the "limit" clause.
> 
> In case you are curious how many there are:
> 
> select count(*) from (
> select name, observed, count(*) from(
> (select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16') 
> union
> (select name, observed, 'YRI' from hapmapSnpsYRI where chrom = 'Chr16') 
> union
> (select name, observed, 'CHB' from hapmapSnpsCHB where chrom = 'Chr16') 
> union
> (select name, observed, 'JPT' from hapmapSnpsJPT where chrom = 'Chr16')
> ) resultAlias group by name, observed having count(*) = 4) resultAlias2;
> +----------+
> |   105841 |
> +----------+
> 1 row in set (47.60 sec)
> 
> 
> Another alternative would be to capture the output from each like this:
> 
> select name, observed, 'CEU' from hapmapSnpsCEU where chrom = 'Chr16'
> 
> for each of your 4 files.
> You could sort them by name (rsId) either with an order by clause in 
> sql, or with the unix sort command.
> 
> You can even use the unix join command to join them up on the name and 
> observed fields.
> 
> Once the contents of each of the 4 sets are sorted by name and observed,
> joining them can be very fast.
> 
> -Galt
> 
> 4/1/2011 8:25 AM, John Hayward:
> >   I would like to run queries against the genome-mysql.cse.ucsc.edu 
> > database which may be excessive and don't want to cause problems for others.
> >
> > I want to find matches for a particular chromosome which have the same name 
> > and observation for tables  hapmapSnpsCEU, haphapmapSnpsYRI, mapSnpsCHB, 
> > hapmapSnpsJPT.
> >
> > Doing a query to pickup the count of hapmapSnpsCEU for one chromosome took 
> > 0.14 seconds.
> > If I do a query to pick up the count joining hapmapSnpsCEU and hapmapSnpCHB 
> > took 8.40 seconds.
> >
> > If I join all tables would that constitute an excessive load?
> >
> > Below is the query joining two tables.
> > ======
> > select count(*) from hapmapSnpsCEU, hapmapSnpsCHB where hapmapSnpsCEU.chrom 
> > = 'Chr16' and hapmapSnpsCHB.chrom = 'Chr16' and hapmapSnpsCEU.name = 
> > hapmapSnpsCHB.name and hapmapSnpsCEU.observed = hapmapSnpsCHB.observed;
> > ======
> > johnh...
> >
> >
> > _______________________________________________
> > Genome maillist  -  [email protected]
> > https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 
> 
> 
> ------------------------------
> 
> Message: 5
> Date: Fri, 01 Apr 2011 14:16:42 -0400
> From: "Lionel (Lee) Brooks 3rd" <[email protected]>
> Subject: Re: [Genome] bedgraph data will not display points
> To: Hiram Clawson <[email protected]>
> Cc: [email protected]
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Hi Hiram,
> 
> >From 
> http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
> 
>    1. Pseudo /line graphs/ can be drawn with the wiggle tracks by
>       setting optional drawing parameters in the display of the track to
>       draw /points/ instead of bars with smoothing on to smear the
>       points together into a line.
> 
> The pseudo line graph functionality is what I desire.
> Previously, it had been possible to do this with bedgraph format files.
> I don't know what "smearing" means.  I'm just looking for a quick way to 
> draw the moving average as I had been able to do before.
> As I mentioned below, my track line is:
> track type=bedGraph autoScale=on graphType=points windowingFunction=mean 
> smoothingWindow=16
> 
> I suppose my solution is to modify my scripts to use the wiggle variable 
> step format?
> 
> 
> thanks,
> -Lionel
> 
> 
> Hiram Clawson wrote:
> > Good Morning Lionel:
> >
> > The bedGraph drawing mechanism can construct bar graphs at your
> > specified intervals, or when you select graphType=points it will
> > draw only the top of the bar graph at your specified intervals.
> > There is no line drawing except by the trick of "smoothing" points
> > such that they appear to be in a line graph.  This only works if
> > the data points are continuous when seen in the genome browser.
> > Smoothing will not smear points into areas where there is no
> > data value specified.
> >
> > The Genome Graphs function of the genome browser:
> >     http://genome.ucsc.edu/cgi-bin/hgGenome
> > will only draw lines between your specified points.
> >
> > See also:
> >
> > http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format 
> >
> >
> > --Hiram
> >
> > Lionel Brooks wrote:
> >> Hello all,
> >>
> >> I have a bedgraph file.  In the past I have used to files to attain 
> >> graphic output in the form of a smoothed line but I uploaded my most 
> >> recent data set and now I cannot get a line graph.  In fact, I'm not 
> >> sure what I am looking at because the values that are displayed along 
> >> the y-axis are not described with a label.
> >> Here is my track line:
> >> track type=bedGraph autoScale=on graphType=points 
> >> windowingFunction=mean smoothingWindow=16
> >>
> >> My data format is
> >>
> >> chr   coordA   coordB   value
> >>
> >> Where approximate distribution of data values are : 5 <= value <= 500.
> >> Is it possible that your plotting function cannot compute this line 
> >> because my coordinate intervals are too small?
> >> Another possibly relevant issue may be that the coordinate intervals 
> >> are not fixed length.
> >>
> >> Any suggestions for course of action would be great.
> >>
> >> Sincerely,
> >> Lionel
> 
> 
> ------------------------------
> 
> Message: 6
> Date: Fri, 1 Apr 2011 11:18:07 -0700 (PDT)
> From: Hiram Clawson <[email protected]>
> Subject: Re: [Genome] bedgraph data will not display points
> To: "Lionel (Lee) Brooks 3rd" <[email protected]>
> Cc: [email protected]
> Message-ID:
>       <[email protected]>
> Content-Type: text/plain; charset=utf-8
> 
> It won't make any difference what type of wiggle format you choose.
> They all draw the same way.
> 
> You are going to have to provide me with a URL to your data file
> so I can see what it looks like.
> 
> --Hiram
> 
> ----- Original Message -----
> From: "Lionel (Lee) Brooks 3rd" <[email protected]>
> To: "Hiram Clawson" <[email protected]>
> Cc: [email protected]
> Sent: Friday, April 1, 2011 11:16:42 AM
> Subject: Re: [Genome] bedgraph data will not display points
> 
> Hi Hiram,
> 
> >From 
> http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
> 
>    1. Pseudo /line graphs/ can be drawn with the wiggle tracks by
>       setting optional drawing parameters in the display of the track to
>       draw /points/ instead of bars with smoothing on to smear the
>       points together into a line.
> 
> The pseudo line graph functionality is what I desire.
> Previously, it had been possible to do this with bedgraph format files.
> I don't know what "smearing" means.  I'm just looking for a quick way to 
> draw the moving average as I had been able to do before.
> As I mentioned below, my track line is:
> track type=bedGraph autoScale=on graphType=points windowingFunction=mean 
> smoothingWindow=16
> 
> I suppose my solution is to modify my scripts to use the wiggle variable 
> step format?
> 
> 
> thanks,
> -Lionel
> 
> 
> Hiram Clawson wrote:
> > Good Morning Lionel:
> >
> > The bedGraph drawing mechanism can construct bar graphs at your
> > specified intervals, or when you select graphType=points it will
> > draw only the top of the bar graph at your specified intervals.
> > There is no line drawing except by the trick of "smoothing" points
> > such that they appear to be in a line graph.  This only works if
> > the data points are continuous when seen in the genome browser.
> > Smoothing will not smear points into areas where there is no
> > data value specified.
> >
> > The Genome Graphs function of the genome browser:
> >     http://genome.ucsc.edu/cgi-bin/hgGenome
> > will only draw lines between your specified points.
> >
> > See also:
> >
> > http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format 
> >
> >
> > --Hiram
> >
> > Lionel Brooks wrote:
> >> Hello all,
> >>
> >> I have a bedgraph file.  In the past I have used to files to attain 
> >> graphic output in the form of a smoothed line but I uploaded my most 
> >> recent data set and now I cannot get a line graph.  In fact, I'm not 
> >> sure what I am looking at because the values that are displayed along 
> >> the y-axis are not described with a label.
> >> Here is my track line:
> >> track type=bedGraph autoScale=on graphType=points 
> >> windowingFunction=mean smoothingWindow=16
> >>
> >> My data format is
> >>
> >> chr   coordA   coordB   value
> >>
> >> Where approximate distribution of data values are : 5 <= value <= 500.
> >> Is it possible that your plotting function cannot compute this line 
> >> because my coordinate intervals are too small?
> >> Another possibly relevant issue may be that the coordinate intervals 
> >> are not fixed length.
> >>
> >> Any suggestions for course of action would be great.
> >>
> >> Sincerely,
> >> Lionel
> 
> 
> ------------------------------
> 
> Message: 7
> Date: Fri, 1 Apr 2011 14:34:07 -0400
> From: "Tom Traut" <[email protected]>
> Subject: [Genome] protein families
> To: [email protected]
> Message-ID: <p06240804c9bbcac80186@[152.19.36.114]>
> Content-Type: text/plain; charset=us-ascii; format=flowed
> 
> Can I use your site (or any other) to find a listing of major protein 
> families?
> 
> how many kinases
> how many proteases
> how many G proteins
> 
> etc
> -- 
> Tom Traut
> 
> Professor of Biochemistry & Biophysics
> 
> Phone:        919 966-5044
> FAX:  919 966-2852
> URL:  www.unc.edu/~traut
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
> 
> 
> End of Genome Digest, Vol 99, Issue 3
> *************************************
                                          
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome
Re: [Genome] help: Exonic position map to Protein position

Reply via email to