Re: [R] .gct file
On Tue, 19 Jul 2005, Marc Schwartz (via MN) wrote: For the TAB delimited columns, adjust the 'sep' argument to: read.table(data.gct, skip = 2, header = TRUE, sep = \t) The 'quote' argument is by default: quote = \' which should take care of the quoted strings and bring them in as a single value. The above presumes that the header row is also TAB delimited. If not, you may have to set 'skip = 3' to skip over the header row and manually set the column names. Not quite. You can open a connection, skip 2 rows and read one to get the column names, then read the rest of the file using read.table on the open connection using the column names you just read. However, based on what we have been shown read.table(data.gct, skip = 2, header = TRUE) ought to work as the file looks as if it is white-space delimited (a tab is white space). HTH, Marc Schwartz On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote: This is all extremely helpful. The data turns out is a little atypical, the columns are tab-delemited except for the description columns DATA1.gct looks like this #1.2 23 3423 NAME DESCRIPTION VALUE gene1 a protein inducer 1123 . . .. How do I get R to read the data as tab delemited, but read in the 2nd coloumn as one value based on the quotation marks.. thanks.. On 7/19/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote: On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote: ok so the gct file looks like this: #1.2 (version number) 7283 19 (matrix size) Name Description Values ... .. How can I tell R to disregard the first two lines and start reading the 3rd line in this gct file. I would just delete them, but I do not know how to open a gct. file thank you On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote: On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch The above would be consistent with the info in my reply. I guess if the format is consistent, as per Mark's example above, you can use: read.table(data.gct, skip = 2, header = TRUE) which will start by skipping the first two lines and then reading in the header row and then the data. See ?read.table HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
Those wondering what gtc file stands for, might be interested in [1]. The original poster can see if the package 'ctc' [2] supports reading in this format but I think Prof. Ripley's solution works too. [1]http://www.broad.mit.edu/cancer/software/genepattern/tutorial/gp_tutorial_fileformats.html [2]http://www.bioconductor.org/packages/bioc/stable/src/contrib/html/ctc.html On Wed, 2005-07-20 at 09:52 +0100, Prof Brian Ripley wrote: On Tue, 19 Jul 2005, Marc Schwartz (via MN) wrote: For the TAB delimited columns, adjust the 'sep' argument to: read.table(data.gct, skip = 2, header = TRUE, sep = \t) The 'quote' argument is by default: quote = \' which should take care of the quoted strings and bring them in as a single value. The above presumes that the header row is also TAB delimited. If not, you may have to set 'skip = 3' to skip over the header row and manually set the column names. Not quite. You can open a connection, skip 2 rows and read one to get the column names, then read the rest of the file using read.table on the open connection using the column names you just read. However, based on what we have been shown read.table(data.gct, skip = 2, header = TRUE) ought to work as the file looks as if it is white-space delimited (a tab is white space). HTH, Marc Schwartz On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote: This is all extremely helpful. The data turns out is a little atypical, the columns are tab-delemited except for the description columns DATA1.gct looks like this #1.2 23 3423 NAME DESCRIPTION VALUE gene1 a protein inducer 1123 . . .. How do I get R to read the data as tab delemited, but read in the 2nd coloumn as one value based on the quotation marks.. thanks.. On 7/19/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote: On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote: ok so the gct file looks like this: #1.2 (version number) 7283 19 (matrix size) Name Description Values ... .. How can I tell R to disregard the first two lines and start reading the 3rd line in this gct file. I would just delete them, but I do not know how to open a gct. file thank you On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote: On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch The above would be consistent with the info in my reply. I guess if the format is consistent, as per Mark's example above, you can use: read.table(data.gct, skip = 2, header = TRUE) which will start by skipping the first two lines and then reading in the header row and then the data. See ?read.table HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] .gct file
I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
On Tue, 2005-07-19 at 12:28 -0400, Duncan Murdoch wrote: On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch A quick Google search would suggest Gene Cluster Text file: http://www.broad.mit.edu/cancer/software/genepattern/tutorial/gp_tutorial_fileformats.html#gct produced by Gene Pattern: http://www.broad.mit.edu/cancer/software/genepattern/ If correct, I would point Mark to the Bioconductor folks for more information and assistance: http://www.bioconductor.org/ HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
ok so the gct file looks like this: #1.2 (version number) 7283 19 (matrix size) Name Description Values ... .. How can I tell R to disregard the first two lines and start reading the 3rd line in this gct file. I would just delete them, but I do not know how to open a gct. file thank you On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote: On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
If it is a text file ?read.table should provide enough details to read the file into R. Based on the file format referenced below it shouldn't be too hard to get at the parts you want. Randy On 7/19/05 1:06 PM, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote: On Tue, 2005-07-19 at 12:28 -0400, Duncan Murdoch wrote: On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch A quick Google search would suggest Gene Cluster Text file: http://www.broad.mit.edu/cancer/software/genepattern/tutorial/gp_tutorial_file formats.html#gct produced by Gene Pattern: http://www.broad.mit.edu/cancer/software/genepattern/ If correct, I would point Mark to the Bioconductor folks for more information and assistance: http://www.bioconductor.org/ HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ~~ Randy Johnson Laboratory of Genomic Diversity NCI-Frederick Bldg 560, Rm 11-85 Frederick, MD 21702 (301)846-1304 ~~ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
On 7/19/2005 1:16 PM, mark salsburg wrote: ok so the gct file looks like this: #1.2 (version number) 7283 19 (matrix size) Name Description Values ... .. How can I tell R to disregard the first two lines and start reading the 3rd line in this gct file. I would just delete them, but I do not know how to open a gct. file Use skip=2. See ?read.table. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
Try ?read.table or args(read.table). Might skip=2 do what you want? spencer graves p.s. I routinely readLines(File, n=11) to see how many headers there are AND identify the sep character. Then I quantile(count.fields(File, ...)) to see if all records have the same number of fields. Then I call something like read.table with the appropriate arguments. Then I print the first 2 or so rows of the result to make sure I read the file correctly. mark salsburg wrote: ok so the gct file looks like this: #1.2 (version number) 7283 19 (matrix size) Name Description Values ... .. How can I tell R to disregard the first two lines and start reading the 3rd line in this gct file. I would just delete them, but I do not know how to open a gct. file thank you On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote: On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote: ok so the gct file looks like this: #1.2 (version number) 7283 19 (matrix size) Name Description Values ... .. How can I tell R to disregard the first two lines and start reading the 3rd line in this gct file. I would just delete them, but I do not know how to open a gct. file thank you On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote: On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch The above would be consistent with the info in my reply. I guess if the format is consistent, as per Mark's example above, you can use: read.table(data.gct, skip = 2, header = TRUE) which will start by skipping the first two lines and then reading in the header row and then the data. See ?read.table HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
This is all extremely helpful. The data turns out is a little atypical, the columns are tab-delemited except for the description columns DATA1.gct looks like this #1.2 23 3423 NAME DESCRIPTION VALUE gene1 a protein inducer 1123 . . .. How do I get R to read the data as tab delemited, but read in the 2nd coloumn as one value based on the quotation marks.. thanks.. On 7/19/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote: On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote: ok so the gct file looks like this: #1.2 (version number) 7283 19 (matrix size) Name Description Values ... .. How can I tell R to disregard the first two lines and start reading the 3rd line in this gct file. I would just delete them, but I do not know how to open a gct. file thank you On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote: On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch The above would be consistent with the info in my reply. I guess if the format is consistent, as per Mark's example above, you can use: read.table(data.gct, skip = 2, header = TRUE) which will start by skipping the first two lines and then reading in the header row and then the data. See ?read.table HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
For the TAB delimited columns, adjust the 'sep' argument to: read.table(data.gct, skip = 2, header = TRUE, sep = \t) The 'quote' argument is by default: quote = \' which should take care of the quoted strings and bring them in as a single value. The above presumes that the header row is also TAB delimited. If not, you may have to set 'skip = 3' to skip over the header row and manually set the column names. HTH, Marc Schwartz On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote: This is all extremely helpful. The data turns out is a little atypical, the columns are tab-delemited except for the description columns DATA1.gct looks like this #1.2 23 3423 NAME DESCRIPTION VALUE gene1 a protein inducer 1123 . . .. How do I get R to read the data as tab delemited, but read in the 2nd coloumn as one value based on the quotation marks.. thanks.. On 7/19/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote: On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote: ok so the gct file looks like this: #1.2 (version number) 7283 19 (matrix size) Name Description Values ... .. How can I tell R to disregard the first two lines and start reading the 3rd line in this gct file. I would just delete them, but I do not know how to open a gct. file thank you On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote: On 7/19/2005 12:10 PM, mark salsburg wrote: I have two files to compare, one is a regular txt file that I can read in no prob. The other is a .gct file (How do I read in this one?) I tried a simple read.table(data.gct, header = T) How do you suggest reading in this file?? .gct is not a standard filename extension. You need to know what is in that file. Where did you get it? What program created it? Chances are the easiest thing to do is to get the program that created it to export in a well known format, e.g. .csv. Duncan Murdoch The above would be consistent with the info in my reply. I guess if the format is consistent, as per Mark's example above, you can use: read.table(data.gct, skip = 2, header = TRUE) which will start by skipping the first two lines and then reading in the header row and then the data. See ?read.table HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
On Tue, 2005-07-19 at 13:08 -0500, Marc Schwartz (via MN) wrote: For the TAB delimited columns, adjust the 'sep' argument to: read.table(data.gct, skip = 2, header = TRUE, sep = \t) The 'quote' argument is by default: quote = \' which should take care of the quoted strings and bring them in as a single value. The above presumes that the header row is also TAB delimited. If not, you may have to set 'skip = 3' to skip over the header row and manually set the column names. One correction. If the final para applies and you need to use 'skip = 3', you would also need to leave out the 'header = TRUE' argument, which defaults to FALSE. Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] .gct file
On Tue, 2005-07-19 at 13:08 -0500, Marc Schwartz (via MN) wrote: For the TAB delimited columns, adjust the 'sep' argument to: read.table(data.gct, skip = 2, header = TRUE, sep = \t) The 'quote' argument is by default: quote = \' which should take care of the quoted strings and bring them in as a single value. The above presumes that the header row is also TAB delimited. If not, you may have to set 'skip = 3' to skip over the header row and manually set the column names. One correction. If the final para applies and you need to use 'skip = 3', you would also need to leave out the 'header = TRUE' argument, which defaults to FALSE. Marc __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html