Thanks, Luvina. I'll disregard the output and perform the join myself. Best, Kathleen
On Mon, Jun 13, 2011 at 2:40 PM, Luvina Guruvadoo <[email protected]> wrote: > Hi Kathleen, > > To answer your questions: > > 1. 'Original order' refers to the order that we received the data in from > GNF. > 2. The Table Browser is not performing the join correctly, so you can > disregard the output. We recommend writing a Perl script to perform the > join. Please feel free to contact us if you need help doing this. > > Regards, > --- > Luvina Guruvadoo > UCSC Genome Bioinformatics Group > > > > kathleen askland wrote: >> >> ---------- Forwarded message ---------- >> From: Mary Goldman <[email protected]> >> Date: Fri, Jun 10, 2011 at 12:40 PM >> Subject: Re: [Genome] GNF Atlas 2 data structure >> To: kathleen askland <[email protected]> >> >> >> Hi Kathleen, >> >> I would appreciate it if you resent your email to [email protected]. >> This will put your question into our tracking system, allowing our >> whole team to work on answering your question. Additionally, other >> users with similar issues will be able to benefit from your question >> and our answers. Thank you for your understanding in this matter. >> >> Much thanks, >> Mary >> ------------------ >> Mary Goldman >> UCSC Bioinformatics Group >> >> On 6/10/11 9:33 AM, kathleen askland wrote: >> >>> >>> Hello Mary, >>> Thank you so much for getting back to me and for your explanation. >>> I do have a couple of follow-up questions based on the data I have >>> downloaded. >>> 1) You note that hgFixed.gnfHumanAtlas2AllExps table has the tissues >>> listed in the 'original' order. Just curious what is meant by >>> 'original' (or are you just contrasting to the changed order used for >>> the hgFixed.gnfHumanAtlas2MedianExps table?) >>> 2) You note that hgFixed.gnfHumanAtlas2AllExps table has the tissues >>> listed with replicates side-by-side: A,A,B,B... and that this connects >>> to the expScores in hgFixed.gnfHumanAtlas2All table. However, if I >>> attempt to download the hgFixed.gnfHumanAtlas2All as the primary >>> table and attempt to join the 'id' and 'name' fields from the >>> hgFixed.gnfHumanAtlas2AllExps table to it, the output (at least in the >>> UCSC Browser window) is very unclear. First of all, all 158 tissue >>> replicate names are not listed in the 'name' field (thus there are >>> fewer tissue names than there are expression values) and, second, the >>> tissue names are in a a different order than would be expected >>> according to the hgFixed.gnfHumanAtlas2AllExps table (i.e., starting >>> with 0=ColorectalAdenocarcinoma, 1=ColorectalAdenocarcinoma2, etc...). >>> Is there an error in the join function? >>> Thanks again for your time and help. >>> Kathleen >>> >>> >>> On Wed, Jun 1, 2011 at 1:04 PM, Mary Goldman<[email protected]> wrote: >>> >>>> >>>> Hi Kathleen, >>>> >>>> Thank you for your patience while we worked on your question! >>>> >>>> The table hgFixed.gnfHumanAtlas2AllExps has the tissues listed in the >>>> original order (with replicates being side-by-side: A,A,B,B,C,C, etc). >>>> This >>>> is the the table that connects the expScores in >>>> hgFixed.gnfHumanAtlas2All to >>>> tissue types and contains the data you want. >>>> >>>> The table hgFixed.gnfHumanAtlas2MedianExps was made to connect with >>>> tables >>>> that had only the median of the two replicates (like gnfAtlas2). When >>>> this >>>> was done, the tissues were also reordered to group similar tissue types. >>>> When the output format "microarray names" is chosen for gnfAtlas2, it >>>> obtains the tissue names from this table >>>> (hgFixed.gnfHumanAtlas2MedianExps). >>>> >>>> Unfortunately, at this point in time, if you select the track GNF Atlas >>>> 2 >>>> from the table browser, it will not let you select >>>> hgFixed.gnfHumanAtlas2AllExps - we are working on this and hope to have >>>> a >>>> fix out soon. To get the data from the hgFixed.gnfHumanAtlas2AllExps >>>> table, >>>> you will need to select the following in the table browser: >>>> >>>> group: All Tables >>>> database: hgFixed >>>> table: hgFixed.gnfHumanAtlas2AllExps >>>> >>>> I hope this information is helpful. Please feel free to contact the mail >>>> list again if you require further assistance. >>>> >>>> Best, >>>> Mary >>>> ------------------ >>>> Mary Goldman >>>> UCSC Bioinformatics Group >>>> >>>> >>>> >>>> On 5/17/11 9:54 AM, kathleen askland wrote: >>>> >>>>> >>>>> Hello Jen, >>>>> >>>>> I wrote you about a year ago with a question about gnf2 expression >>>>> data that I downloaded using the UCSC genome table browser. I've come >>>>> back to this data for a different project and was reviewing our >>>>> correspondence (see previous emails at bottom of page) and checking it >>>>> against some downloaded data. There seems to be a significant >>>>> discrepancy that I hope you can clarify. >>>>> >>>>> Essentially, I want to be certain that I know which tissue and >>>>> replicate each of the expression values in the output file corresponds >>>>> to. >>>>> So, I downloaded the GNF Atlas 2 absolute expression values for both >>>>> original samples/replicates by opening the Table Browser and >>>>> proceeding as follows: >>>>> 1) Selected Clade: Mammal, Genome: Human, Assembly: Feb 2009, Group: >>>>> Expression, Track: GNF Atlas 2, Table: hgFixed.gnfHumanAtlas2All >>>>> 2) Next, I selected output format: 'all fields from selected table' >>>>> 3) then I clicked 'Get output,' which opens an html window with the >>>>> requested data, the first two lines of which is as follows: >>>>> >>>>> #name expCount expScores >>>>> 1007_s_at 158 >>>>> >>>>> 3621,3212,1078,1130,475,408,375,528,668,482,543,392,745,996,696,649,1124,1259,291,451,707,745,1022,1296,2956,2359,1462,2318,1157,1437,1662,841,1288,1575,3465,2565,1281,1504,1203,1415,1919,1330,292,112,1039,1498,1868,1679,1855,2219,2701,3162,3561,2943,3455,4784,4332,4136,3441,3333,3043,2922,3291,4413,2727,5157,3332,3064,6515,6949,4237,5045,1896,1810,2531,2425,2542,2070,8931,9319,4300,4765,2586,2623,3334,5043,1872,2320,1515,2165,2561,2859,5122,5007,1572,1717,5614,5501,4380,4137,2087,2416,4298,4484,1867,2184,2081,1932,5530,6309,1077,1149,3709,1832,2859,8037,1718,1876,1303,1537,1441,925,864,978,1571,1110,2494,1825,4551,2741,1588,1161,726,1428,1434,1005,1687,1509,775,996,930,1187,768,800,1110,1114,1436,1281,1211,1171,1225,1455,2559,2741,3083,4111,2179,2653, >>>>> >>>>> 1053_at 158 >>>>> >>>>> 1041,522,265,351,222,244,519,248,272,247,297,538,191,60,195,102,390,635,526,384,510,700,549,657,1436,1441,316,253,301,228,530,905,757,530,247,296,228,301,182,229,175,99,453,329,239,130,30,32,29,79,147,75,42,104,74,112,142,121,50,76,98,28,119,124,24,129,24,109,30,194,110,48,122,19,17,172,27,158,221,60,38,231,17,60,378,242,170,318,54,212,17,74,42,170,30,126,224,199,136,123,153,135,155,25,293,396,303,214,270,145,159,31,62,95,118,111,153,122,57,171,174,214,73,30,29,106,16,225,67,24,131,48,76,28,172,46,70,35,34,117,29,75,22,25,59,97,21,72,38,127,130,74,156,31,31,17,55,33, >>>>> >>>>> Since this particular table does not have the expression IDs or >>>>> descriptive names, I do not know which tissues/replicates each of the >>>>> 158 values for each probe corresponds to. So, my first question is: >>>>> Are the expression values in order of the tissue ID with each pair of >>>>> replicates adjacent to one another (i.e., 0,0,1,1,2,2,3,3,etc...), or >>>>> ordered by tissue ID for first replicate then by tissue ID for second >>>>> replicate (i.e., 0,1,2,3,....; 0,1,2,3,...), or in some other order? >>>>> >>>>> Finally, I want to be sure that the tissue IDs listed in the table >>>>> 'hgFixed.gnfHumanAtlas2MedianExps' (pasted below) are the same tissue >>>>> IDs that I should be using to reference the absolute expression data >>>>> provided in the 'hgFixed.gnfHumanAtlas2All' table. I ask this, in >>>>> particular, because your correspondence of March 30,2010 indicated: " >>>>> >>>>>> >>>>>> For example, gnfHumanAtlas2AllExps.id =0 or =1, the first two fields >>>>>> are: >>>>>> >>>>>> id name >>>>>> >>>>>> 0 ColorectalAdenocarcinoma >>>>>> >>>>>> 1 ColorectalAdenocarcinoma 2 " >>>>>> >>>>> >>>>> which is different than the tissue ID-tissue description matches >>>>> listed when I select and output the 'hgFixed.gnfHumanAtlas2All' >>>>> table, for which I get the following list: >>>>> >>>>> #id description >>>>> 0 fetal brain >>>>> 1 whole brain >>>>> 2 temporal lobe >>>>> 3 parietal lobe >>>>> 4 occipital lobe >>>>> 5 prefrontal cortex >>>>> 6 cingulate cortex >>>>> 7 cerebellum >>>>> 8 cerebellum peduncles >>>>> 9 amygdala >>>>> 10 hypothalamus >>>>> 11 thalamus >>>>> 12 subthalamic nucleus >>>>> 13 caudate nucleus >>>>> 14 globus pallidus >>>>> 15 olfactory bulb >>>>> 16 pons >>>>> 17 medulla oblongata >>>>> 18 spinal cord >>>>> 19 ciliary ganglion >>>>> 20 trigeminal ganglion >>>>> 21 superior cervical ganglion >>>>> 22 dorsal root ganglion >>>>> 23 thymus >>>>> 24 tonsil >>>>> 25 lymph node >>>>> 26 bone marrow >>>>> 27 BM-CD71+ early erythroid >>>>> 28 BM-CD33+ myeloid >>>>> 29 BM-CD105+ endothelial >>>>> 30 BM-CD34+ >>>>> 31 whole blood >>>>> 32 PB-BDCA4+ dentritic cells >>>>> 33 PB-CD14+ monocytes >>>>> 34 PB-CD56+ NKCells >>>>> 35 PB-CD4+ Tcells >>>>> 36 PB-CD8+ Tcells >>>>> 37 PB-CD19+ Bcells >>>>> 38 leukemia lymphoblastic(molt4) >>>>> 39 721 B lymphoblasts >>>>> 40 lymphoma Burkitts Raji >>>>> 41 leukemia promyelocytic(hl60) >>>>> 42 lymphoma Burkitts Daudi >>>>> 43 leukemia chronic myelogenous(k562) >>>>> 44 colorectal adenocarcinoma >>>>> 45 appendix >>>>> 46 skin >>>>> 47 adipocyte >>>>> 48 fetal thyroid >>>>> 49 thyroid >>>>> 50 pituitary gland >>>>> 51 adrenal gland >>>>> 52 adrenal cortex >>>>> 53 prostate >>>>> 54 salivary gland >>>>> 55 pancreas >>>>> 56 pancreatic islets >>>>> 57 atrioventricular node >>>>> 58 heart >>>>> 59 cardiac myocytes >>>>> 60 skeletal muscle >>>>> 61 tongue >>>>> 62 smooth muscle >>>>> 63 uterus >>>>> 64 uterus corpus >>>>> 65 trachea >>>>> 66 bronchial epithelial cells >>>>> 67 fetal lung >>>>> 68 lung >>>>> 69 kidney >>>>> 70 fetal liver >>>>> 71 liver >>>>> 72 placenta >>>>> 73 testis >>>>> 74 testis Leydig cell >>>>> 75 testis germ cell >>>>> 76 testis interstitial >>>>> 77 testis seminiferous tubule >>>>> 78 ovary >>>>> >>>>> Thank you for any assistance you may provide. >>>>> >>>>> Kathleen >>>>> >>>>> >>>>> On Tue, Mar 30, 2010 at 3:51 PM, Jennifer Jackson<[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> Hello Kathleen, >>>>>> >>>>>> There are 76 distinct tissues with two replicates per experiment, >>>>>> which >>>>>> brings the number of values = 158 scores. The order of the tissues is >>>>>> in >>>>>> the >>>>>> gnfHumanAtlas2AllExps.id field, the tissue names are in the >>>>>> gnfHumanAtlas2AllExps.name field. >>>>>> >>>>>> For example, gnfHumanAtlas2AllExps.id =0 or =1, the first two fields >>>>>> are: >>>>>> >>>>>> id name >>>>>> >>>>>> 0 ColorectalAdenocarcinoma >>>>>> >>>>>> 1 ColorectalAdenocarcinoma 2 >>>>>> >>>>>> This replication per-tissue is explained in the track's description >>>>>> page >>>>>> (open Assembly browser and click on track name - or - open the Table >>>>>> browser >>>>>> to the track, leave the primary table as-is, click on "describe table >>>>>> schema", then scroll to the bottom on the page. >>>>>> >>>>>> Hopefully this addresses your questions, but please let us know if you >>>>>> need >>>>>> more information, >>>>>> Jen >>>>>> >>>>>> --------------------------------- >>>>>> Jennifer Jackson >>>>>> UCSC Genome Informatics Group >>>>>> http://genome.ucsc.edu/ >>>>>> >>>>>> On 3/30/10 5:49 AM, kathleen askland wrote: >>>>>> >>>>>>> >>>>>>> I have recently downloaded human expression data via UCSC genome >>>>>>> Table >>>>>>> Browser using the following query parameters: Mammal, human, >>>>>>> Assembly: >>>>>>> Feb 2009(GRCh37/hg19), Group: Expression, Track: GNFAtlas2, Table: >>>>>>> hgFixed.gnfHumanAtlas2All, as I wanted all available replicates >>>>>>> available for each probe. >>>>>>> >>>>>>> However, the file output is very difficult to understand. There were >>>>>>> 44775 probes (as expected) for which data are available. Each probe >>>>>>> has a corresponding 'hgFixed.gnfHumanAtlas2All.expCount' value= 158, >>>>>>> suggesting there should be 158 expression values per probe and, in >>>>>>> fact, the column headed 'hgFixed.gnfHumanAtlas2All.expScores' does in >>>>>>> fact contain 158 comma-separated absolute expression values. >>>>>>> >>>>>>> However, I am not able to obtain the EXP ids (i.e., tissue name) >>>>>>> associated with each of the 158 expression values in the sequence so >>>>>>> how is one supposed to figure out which tissue each of the 158 >>>>>>> expression scores corresponds to? >>>>>>> >>>>>>> I have attempted to obtain those expression IDs in several ways, by >>>>>>> selecting different associated tables to join and seemingly relevant >>>>>>> variables to no avail. Moreover, even more confusingly, when I select >>>>>>> from associated table gnfHumanAtlas2MedianExps the variables >>>>>>> 'hgFixed.gnfHumanAtlas2AllExps.id' and >>>>>>> 'hgFixed.gnfHumanAtlas2AllExps.name' which would seem like the >>>>>>> desired >>>>>>> information, I get a series of comma-separated EXP ids and the >>>>>>> corresponding EXP id tissue names (e.g., 112 and Pancreas, >>>>>>> respectively), but there are generally not 158 entries in each of >>>>>>> these cells and many probes have 'n/a' in both columns. >>>>>>> >>>>>>> So, for example, probe '1007_s_at' has the following associated data: >>>>>>> hgFixed.gnfHumanAtlas2All.expCount='158', >>>>>>> hgFixed.gnfHumanAtlas2All.expScores= >>>>>>> '3621,3212,1078,1130,475,408,375,528,...' (158 distinct values >>>>>>> comma-separated) >>>>>>> hgFixed.gnfHumanAtlas2AllExps.id= '112' >>>>>>> hgFixed.gnfHumanAtlas2AllExps.name='Pancreas' >>>>>>> >>>>>>> While probe '117_at' gives: >>>>>>> hgFixed.gnfHumanAtlas2All.expCount='158', >>>>>>> hgFixed.gnfHumanAtlas2All.expScores= >>>>>>> '338,277,2383,2456,617,423,...'(158 comma-separated values) >>>>>>> hgFixed.gnfHumanAtlas2AllExps.id= >>>>>>> '52,74,75,85,94,96,98,112,121,127,129,137,' >>>>>>> >>>>>>> >>>>>>> >>>>>>> hgFixed.gnfHumanAtlas2AllExps.name='cerebellum,CingulateCortex,CingulateCortex >>>>>>> 2,Lung 2,Uterus,Thyroid,fetalThyroid,Pancreas,TestisGermCell >>>>>>> 2,salivarygland 2,trachea 2,skin 2,' >>>>>>> >>>>>>> Since the number of expression values listed under >>>>>>> 'hgFixed.gnfHumanAtlas2All.expScores' does not correspond to the >>>>>>> number of Expression IDs/names listed under >>>>>>> 'hgFixed.gnfHumanAtlas2AllExps.id' and >>>>>>> 'hgFixed.gnfHumanAtlas2AllExps.name', respectively, how is one >>>>>>> supposed to figure out which tissue each of the 158 expression scores >>>>>>> corresponds to? >>>>>>> >>>>>>> >>> >>> >> >> >> >> > > -- Kathleen Askland, MD Assistant Professor Department of Psychiatry & Human Behavior Warren Alpert School of Medicine Brown University/Butler Hospital _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
