Re: [R] .gct file

2005-07-20 Thread Prof Brian Ripley
On Tue, 19 Jul 2005, Marc Schwartz (via MN) wrote:

 For the TAB delimited columns, adjust the 'sep' argument to:

 read.table(data.gct, skip = 2, header = TRUE, sep = \t)

 The 'quote' argument is by default:

 quote = \'

 which should take care of the quoted strings and bring them in as a
 single value.

 The above presumes that the header row is also TAB delimited. If not,
 you may have to set 'skip = 3' to skip over the header row and manually
 set the column names.

Not quite.  You can open a connection, skip 2 rows and read one to get the 
column names, then read the rest of the file using read.table on the open 
connection using the column names you just read.

However, based on what we have been shown

read.table(data.gct, skip = 2, header = TRUE)

ought to work as the file looks as if it is white-space delimited (a tab 
is white space).


 HTH,

 Marc Schwartz


 On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote:
 This is all extremely helpful.

 The data turns out is a little atypical, the columns are tab-delemited
 except for the description columns


 DATA1.gct looks like this

 #1.2
 23 3423
 NAME DESCRIPTION VALUE
 gene1 a protein inducer 1123
 .  . ..

 How do I get R to read the data as tab delemited, but read in the 2nd
 coloumn as one value based on the quotation marks..

 thanks..

 On 7/19/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:
 On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
 ok so the gct file looks like this:

 #1.2  (version number)
 7283 19   (matrix size)
 Name Description Values
   ...  ..

 How can I tell R to disregard the first two lines and start reading
 the 3rd line in this gct file. I would just delete them, but I do not
 know how to open a gct. file

 thank you

 On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
 On 7/19/2005 12:10 PM, mark salsburg wrote:
 I have two files to compare, one is a regular txt file that I can read
 in no prob.

 The other is a .gct file (How do I read in this one?)

 I tried a simple

 read.table(data.gct, header = T)

 How do you suggest reading in this file??


 .gct is not a standard filename extension.  You need to know what is in
 that file.  Where did you get it?  What program created it?

 Chances are the easiest thing to do is to get the program that created
 it to export in a well known format, e.g. .csv.

 Duncan Murdoch


 The above would be consistent with the info in my reply.

 I guess if the format is consistent, as per Mark's example above, you
 can use:

 read.table(data.gct, skip = 2, header = TRUE)

 which will start by skipping the first two lines and then reading in the
 header row and then the data.

 See ?read.table

 HTH,

 Marc Schwartz




 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-20 Thread Adaikalavan Ramasamy
Those wondering what gtc file stands for, might be interested in [1].

The original poster can see if the package 'ctc' [2] supports reading in
this format but I think Prof. Ripley's solution works too.

[1]http://www.broad.mit.edu/cancer/software/genepattern/tutorial/gp_tutorial_fileformats.html
[2]http://www.bioconductor.org/packages/bioc/stable/src/contrib/html/ctc.html
 


On Wed, 2005-07-20 at 09:52 +0100, Prof Brian Ripley wrote:
 On Tue, 19 Jul 2005, Marc Schwartz (via MN) wrote:
 
  For the TAB delimited columns, adjust the 'sep' argument to:
 
  read.table(data.gct, skip = 2, header = TRUE, sep = \t)
 
  The 'quote' argument is by default:
 
  quote = \'
 
  which should take care of the quoted strings and bring them in as a
  single value.
 
  The above presumes that the header row is also TAB delimited. If not,
  you may have to set 'skip = 3' to skip over the header row and manually
  set the column names.
 
 Not quite.  You can open a connection, skip 2 rows and read one to get the 
 column names, then read the rest of the file using read.table on the open 
 connection using the column names you just read.
 
 However, based on what we have been shown
 
 read.table(data.gct, skip = 2, header = TRUE)
 
 ought to work as the file looks as if it is white-space delimited (a tab 
 is white space).
 
 
  HTH,
 
  Marc Schwartz
 
 
  On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote:
  This is all extremely helpful.
 
  The data turns out is a little atypical, the columns are tab-delemited
  except for the description columns
 
 
  DATA1.gct looks like this
 
  #1.2
  23 3423
  NAME DESCRIPTION VALUE
  gene1 a protein inducer 1123
  .  . ..
 
  How do I get R to read the data as tab delemited, but read in the 2nd
  coloumn as one value based on the quotation marks..
 
  thanks..
 
  On 7/19/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:
  On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
  ok so the gct file looks like this:
 
  #1.2  (version number)
  7283 19   (matrix size)
  Name Description Values
    ...  ..
 
  How can I tell R to disregard the first two lines and start reading
  the 3rd line in this gct file. I would just delete them, but I do not
  know how to open a gct. file
 
  thank you
 
  On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
  On 7/19/2005 12:10 PM, mark salsburg wrote:
  I have two files to compare, one is a regular txt file that I can read
  in no prob.
 
  The other is a .gct file (How do I read in this one?)
 
  I tried a simple
 
  read.table(data.gct, header = T)
 
  How do you suggest reading in this file??
 
 
  .gct is not a standard filename extension.  You need to know what is in
  that file.  Where did you get it?  What program created it?
 
  Chances are the easiest thing to do is to get the program that created
  it to export in a well known format, e.g. .csv.
 
  Duncan Murdoch
 
 
  The above would be consistent with the info in my reply.
 
  I guess if the format is consistent, as per Mark's example above, you
  can use:
 
  read.table(data.gct, skip = 2, header = TRUE)
 
  which will start by skipping the first two lines and then reading in the
  header row and then the data.
 
  See ?read.table
 
  HTH,
 
  Marc Schwartz
 
 
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] .gct file

2005-07-19 Thread mark salsburg
I have two files to compare, one is a regular txt file that I can read
in no prob.

The other is a .gct file (How do I read in this one?)

I tried a simple

read.table(data.gct, header = T)

How do you suggest reading in this file??

thank you.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Duncan Murdoch
On 7/19/2005 12:10 PM, mark salsburg wrote:
 I have two files to compare, one is a regular txt file that I can read
 in no prob.
 
 The other is a .gct file (How do I read in this one?)
 
 I tried a simple
 
 read.table(data.gct, header = T)
 
 How do you suggest reading in this file??
 

.gct is not a standard filename extension.  You need to know what is in 
that file.  Where did you get it?  What program created it?

Chances are the easiest thing to do is to get the program that created 
it to export in a well known format, e.g. .csv.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 12:28 -0400, Duncan Murdoch wrote:
 On 7/19/2005 12:10 PM, mark salsburg wrote:
  I have two files to compare, one is a regular txt file that I can read
  in no prob.
  
  The other is a .gct file (How do I read in this one?)
  
  I tried a simple
  
  read.table(data.gct, header = T)
  
  How do you suggest reading in this file??
  
 
 .gct is not a standard filename extension.  You need to know what is in 
 that file.  Where did you get it?  What program created it?
 
 Chances are the easiest thing to do is to get the program that created 
 it to export in a well known format, e.g. .csv.
 
 Duncan Murdoch

A quick Google search would suggest Gene Cluster Text file:

http://www.broad.mit.edu/cancer/software/genepattern/tutorial/gp_tutorial_fileformats.html#gct

produced by Gene Pattern:

http://www.broad.mit.edu/cancer/software/genepattern/

If correct, I would point Mark to the Bioconductor folks for more
information and assistance:

http://www.bioconductor.org/

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread mark salsburg
ok so the gct file looks like this:

#1.2  (version number)
7283 19   (matrix size)
Name Description Values
  ...  ..

How can I tell R to disregard the first two lines and start reading
the 3rd line in this gct file. I would just delete them, but I do not
know how to open a gct. file

thank you

On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
 On 7/19/2005 12:10 PM, mark salsburg wrote:
  I have two files to compare, one is a regular txt file that I can read
  in no prob.
 
  The other is a .gct file (How do I read in this one?)
 
  I tried a simple
 
  read.table(data.gct, header = T)
 
  How do you suggest reading in this file??
 
 
 .gct is not a standard filename extension.  You need to know what is in
 that file.  Where did you get it?  What program created it?
 
 Chances are the easiest thing to do is to get the program that created
 it to export in a well known format, e.g. .csv.
 
 Duncan Murdoch


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Randy Johnson
If it is a text file ?read.table should provide enough details to read the
file into R. Based on the file format referenced below it shouldn't be too
hard to get at the parts you want.

Randy


On 7/19/05 1:06 PM, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:

 On Tue, 2005-07-19 at 12:28 -0400, Duncan Murdoch wrote:
 On 7/19/2005 12:10 PM, mark salsburg wrote:
 I have two files to compare, one is a regular txt file that I can read
 in no prob.
 
 The other is a .gct file (How do I read in this one?)
 
 I tried a simple
 
 read.table(data.gct, header = T)
 
 How do you suggest reading in this file??
 
 
 .gct is not a standard filename extension.  You need to know what is in
 that file.  Where did you get it?  What program created it?
 
 Chances are the easiest thing to do is to get the program that created
 it to export in a well known format, e.g. .csv.
 
 Duncan Murdoch
 
 A quick Google search would suggest Gene Cluster Text file:
 
 http://www.broad.mit.edu/cancer/software/genepattern/tutorial/gp_tutorial_file
 formats.html#gct
 
 produced by Gene Pattern:
 
 http://www.broad.mit.edu/cancer/software/genepattern/
 
 If correct, I would point Mark to the Bioconductor folks for more
 information and assistance:
 
 http://www.bioconductor.org/
 
 HTH,
 
 Marc Schwartz
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

~~
Randy Johnson
Laboratory of Genomic Diversity
NCI-Frederick
Bldg 560, Rm 11-85
Frederick, MD 21702
(301)846-1304
~~

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Duncan Murdoch
On 7/19/2005 1:16 PM, mark salsburg wrote:
 ok so the gct file looks like this:
 
 #1.2  (version number)
 7283 19   (matrix size)
 Name Description Values
   ...  ..
 
 How can I tell R to disregard the first two lines and start reading
 the 3rd line in this gct file. I would just delete them, but I do not
 know how to open a gct. file

Use skip=2.  See ?read.table.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Spencer Graves
  Try ?read.table or args(read.table).  Might skip=2 do what you want?

  spencer graves
p.s.  I routinely readLines(File, n=11) to see how many headers there 
are AND identify the sep character.  Then I 
quantile(count.fields(File, ...)) to see if all records have the same 
number of fields.  Then I call something like read.table with the 
appropriate arguments.  Then I print the first 2 or so rows of the 
result to make sure I read the file correctly.

mark salsburg wrote:

 ok so the gct file looks like this:
 
 #1.2  (version number)
 7283 19   (matrix size)
 Name Description Values
   ...  ..
 
 How can I tell R to disregard the first two lines and start reading
 the 3rd line in this gct file. I would just delete them, but I do not
 know how to open a gct. file
 
 thank you
 
 On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
 
On 7/19/2005 12:10 PM, mark salsburg wrote:

I have two files to compare, one is a regular txt file that I can read
in no prob.

The other is a .gct file (How do I read in this one?)

I tried a simple

read.table(data.gct, header = T)

How do you suggest reading in this file??


.gct is not a standard filename extension.  You need to know what is in
that file.  Where did you get it?  What program created it?

Chances are the easiest thing to do is to get the program that created
it to export in a well known format, e.g. .csv.

Duncan Murdoch

 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

[EMAIL PROTECTED]
www.pdf.com http://www.pdf.com
Tel:  408-938-4420
Fax: 408-280-7915

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
 ok so the gct file looks like this:
 
 #1.2  (version number)
 7283 19   (matrix size)
 Name Description Values
   ...  ..
 
 How can I tell R to disregard the first two lines and start reading
 the 3rd line in this gct file. I would just delete them, but I do not
 know how to open a gct. file
 
 thank you
 
 On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
  On 7/19/2005 12:10 PM, mark salsburg wrote:
   I have two files to compare, one is a regular txt file that I can read
   in no prob.
  
   The other is a .gct file (How do I read in this one?)
  
   I tried a simple
  
   read.table(data.gct, header = T)
  
   How do you suggest reading in this file??
  
  
  .gct is not a standard filename extension.  You need to know what is in
  that file.  Where did you get it?  What program created it?
  
  Chances are the easiest thing to do is to get the program that created
  it to export in a well known format, e.g. .csv.
  
  Duncan Murdoch


The above would be consistent with the info in my reply.

I guess if the format is consistent, as per Mark's example above, you
can use:

read.table(data.gct, skip = 2, header = TRUE)

which will start by skipping the first two lines and then reading in the
header row and then the data.

See ?read.table

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread mark salsburg
This is all extremely helpful.

The data turns out is a little atypical, the columns are tab-delemited
except for the description columns


DATA1.gct looks like this

#1.2
23 3423
NAME DESCRIPTION VALUE
gene1 a protein inducer 1123
.  . ..

How do I get R to read the data as tab delemited, but read in the 2nd
coloumn as one value based on the quotation marks..

thanks..

On 7/19/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:
 On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
  ok so the gct file looks like this:
 
  #1.2  (version number)
  7283 19   (matrix size)
  Name Description Values
    ...  ..
 
  How can I tell R to disregard the first two lines and start reading
  the 3rd line in this gct file. I would just delete them, but I do not
  know how to open a gct. file
 
  thank you
 
  On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
   On 7/19/2005 12:10 PM, mark salsburg wrote:
I have two files to compare, one is a regular txt file that I can read
in no prob.
   
The other is a .gct file (How do I read in this one?)
   
I tried a simple
   
read.table(data.gct, header = T)
   
How do you suggest reading in this file??
   
  
   .gct is not a standard filename extension.  You need to know what is in
   that file.  Where did you get it?  What program created it?
  
   Chances are the easiest thing to do is to get the program that created
   it to export in a well known format, e.g. .csv.
  
   Duncan Murdoch
 
 
 The above would be consistent with the info in my reply.
 
 I guess if the format is consistent, as per Mark's example above, you
 can use:
 
 read.table(data.gct, skip = 2, header = TRUE)
 
 which will start by skipping the first two lines and then reading in the
 header row and then the data.
 
 See ?read.table
 
 HTH,
 
 Marc Schwartz
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
For the TAB delimited columns, adjust the 'sep' argument to:

read.table(data.gct, skip = 2, header = TRUE, sep = \t)

The 'quote' argument is by default:

quote = \'

which should take care of the quoted strings and bring them in as a
single value.

The above presumes that the header row is also TAB delimited. If not,
you may have to set 'skip = 3' to skip over the header row and manually
set the column names.

HTH,

Marc Schwartz


On Tue, 2005-07-19 at 13:52 -0400, mark salsburg wrote:
 This is all extremely helpful.
 
 The data turns out is a little atypical, the columns are tab-delemited
 except for the description columns
 
 
 DATA1.gct looks like this
 
 #1.2
 23 3423
 NAME DESCRIPTION VALUE
 gene1 a protein inducer 1123
 .  . ..
 
 How do I get R to read the data as tab delemited, but read in the 2nd
 coloumn as one value based on the quotation marks..
 
 thanks..
 
 On 7/19/05, Marc Schwartz (via MN) [EMAIL PROTECTED] wrote:
  On Tue, 2005-07-19 at 13:16 -0400, mark salsburg wrote:
   ok so the gct file looks like this:
  
   #1.2  (version number)
   7283 19   (matrix size)
   Name Description Values
     ...  ..
  
   How can I tell R to disregard the first two lines and start reading
   the 3rd line in this gct file. I would just delete them, but I do not
   know how to open a gct. file
  
   thank you
  
   On 7/19/05, Duncan Murdoch [EMAIL PROTECTED] wrote:
On 7/19/2005 12:10 PM, mark salsburg wrote:
 I have two files to compare, one is a regular txt file that I can read
 in no prob.

 The other is a .gct file (How do I read in this one?)

 I tried a simple

 read.table(data.gct, header = T)

 How do you suggest reading in this file??

   
.gct is not a standard filename extension.  You need to know what is in
that file.  Where did you get it?  What program created it?
   
Chances are the easiest thing to do is to get the program that created
it to export in a well known format, e.g. .csv.
   
Duncan Murdoch
  
  
  The above would be consistent with the info in my reply.
  
  I guess if the format is consistent, as per Mark's example above, you
  can use:
  
  read.table(data.gct, skip = 2, header = TRUE)
  
  which will start by skipping the first two lines and then reading in the
  header row and then the data.
  
  See ?read.table
  
  HTH,
  
  Marc Schwartz
  
  
 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz
On Tue, 2005-07-19 at 13:08 -0500, Marc Schwartz (via MN) wrote:
 For the TAB delimited columns, adjust the 'sep' argument to:
 
 read.table(data.gct, skip = 2, header = TRUE, sep = \t)
 
 The 'quote' argument is by default:
 
 quote = \'
 
 which should take care of the quoted strings and bring them in as a
 single value.
 
 The above presumes that the header row is also TAB delimited. If not,
 you may have to set 'skip = 3' to skip over the header row and manually
 set the column names.

One correction. If the final para applies and you need to use 'skip =
3', you would also need to leave out the 'header = TRUE' argument, which
defaults to FALSE.

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] .gct file

2005-07-19 Thread Marc Schwartz (via MN)
On Tue, 2005-07-19 at 13:08 -0500, Marc Schwartz (via MN) wrote:
 For the TAB delimited columns, adjust the 'sep' argument to:
 
 read.table(data.gct, skip = 2, header = TRUE, sep = \t)
 
 The 'quote' argument is by default:
 
 quote = \'
 
 which should take care of the quoted strings and bring them in as a
 single value.
 
 The above presumes that the header row is also TAB delimited. If not,
 you may have to set 'skip = 3' to skip over the header row and manually
 set the column names.

One correction. If the final para applies and you need to use 'skip =
3', you would also need to leave out the 'header = TRUE' argument, which
defaults to FALSE.

Marc

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html