Re: [R] read columns of quoted numbers as factors

2010-10-06 Thread Peter Dalgaard
On 10/06/2010 02:41 AM, james hirschorn wrote:
 Yes, your solution of setting quote= would read the multi-word strings 
 incorrectly. A more complicated version of your solution should work: First 
 check which columns are identified as strings, and then apply your solution 
 to 
 the remaining columns.

Probably more painful than that if column separators can appear in
strings. The best I can think of involves trying to reread the columns
that get classified as numeric with colClasses=numeric and see if they
fail. A general solution likely requires changing scan() at C-level.

 
 I'm a newbie at R, but it seems to me that there is a logical inconsistency 
 in 
 R: write.table puts quotes around numbers when they form a column of factors, 
 but does not put quotes for a column of integers. Since read.table is the 
 dual 
 of write.table it seems that it should treat quoted and unquoted columns 
 differently, analogously to write.table. However, there does not even seem to 
 be 
 an option to make read.table behave analogously.

Yes, and far from the only such case in R. (Even more annoying to my
eyes is that factor levels get reordered alphabetically, so write.table
is really not an option for storage of data frames anyway).

However, the quoting of factor levels on output from write.table is not
happening to distinguish numbers from character strings. Rather, it is
for potentially multi-word level names.


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read columns of quoted numbers as factors

2010-10-05 Thread Bernardo Rangel Tura
On Mon, 2010-10-04 at 09:39 -0700, james hirschorn wrote:
 Suppose I have a data file (possibly with a huge number of columns), where 
 the 
 columns with factors are coded as 1, 2, 3, etc ... The default behavior 
 of 
 read.table is to convert these columns to integer vectors. 
 
 Is there a way to get read.table to recognize that columns of quoted numbers 
 represent factors (while unquoted numbers are interpreted as integers), 
 without 
 explicitly setting them with colClasses ?
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Hi James,

I think you solve ypur problem using the options colClasses in the
read.table command, something like this

rea.table('name.of.table',colClasses=c(rep(30,'integer'),rep(5,'numeric'),etc))
-- 
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read columns of quoted numbers as factors

2010-10-05 Thread peter dalgaard

On Oct 4, 2010, at 18:39 , james hirschorn wrote:

 Suppose I have a data file (possibly with a huge number of columns), where 
 the 
 columns with factors are coded as 1, 2, 3, etc ... The default behavior 
 of 
 read.table is to convert these columns to integer vectors. 
 
 Is there a way to get read.table to recognize that columns of quoted numbers 
 represent factors (while unquoted numbers are interpreted as integers), 
 without 
 explicitly setting them with colClasses ?

I don't think there's a simple way, because the modus operandi of read.table is 
to read everything as character and then see whether it can be converted to 
numeric, and at that point any quotes will have been lost.

One possibility, somewhat dependent on the exact file format, would be to 
temporarily set quote=, see which columns contains quote characters, and, on 
a second pass, read those columns as factors, using  a computed colClasses 
argument. It will break down if you have space-separated columns with quoted 
multi-word strings, though.


 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read columns of quoted numbers as factors

2010-10-05 Thread Mike Marchywka






 From: pda...@gmail.com
 Date: Tue, 5 Oct 2010 13:25:52 +0200
 To: j_hirsch...@yahoo.com
 CC: r-help@r-project.org
 Subject: Re: [R] read columns of quoted numbers as factors


 On Oct 4, 2010, at 18:39 , james hirschorn wrote:

  Suppose I have a data file (possibly with a huge number of columns), where 
  the
  columns with factors are coded as 1, 2, 3, etc ... The default 
  behavior of
  read.table is to convert these columns to integer vectors.
 
  Is there a way to get read.table to recognize that columns of quoted numbers
  represent factors (while unquoted numbers are interpreted as integers), 
  without
  explicitly setting them with colClasses ?

 I don't think there's a simple way, because the modus operandi of read.table 
 is to read everything as character and then see whether it can be converted 
 to numeric, and at that point any quotes will have been lost.

 One possibility, somewhat dependent on the exact file format, would be to 
 temporarily set quote=, see which columns contains quote characters, and, 
 on a second pass, read those columns as factors, using a computed colClasses 
 argument. It will break down if you have space-separated columns with quoted 
 multi-word strings, though.



While this specific example may or may not lend itself to a solution within R,
I would just mention that it is not a faux pas to modify your data file
with something like sed or awk prior to feeding it to some program like R.
Quotes,spaces, commas, etc, may be something that the target app can handle
or it may just be easier to change the format with a familiar tool designed
for that.

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read columns of quoted numbers as factors

2010-10-05 Thread james hirschorn
Yes, your solution of setting quote= would read the multi-word strings 
incorrectly. A more complicated version of your solution should work: First 
check which columns are identified as strings, and then apply your solution to 
the remaining columns.

I'm a newbie at R, but it seems to me that there is a logical inconsistency 
in 
R: write.table puts quotes around numbers when they form a column of factors, 
but does not put quotes for a column of integers. Since read.table is the 
dual 
of write.table it seems that it should treat quoted and unquoted columns 
differently, analogously to write.table. However, there does not even seem to 
be 
an option to make read.table behave analogously.


- Original Message 
From: peter dalgaard pda...@gmail.com
To: james hirschorn j_hirsch...@yahoo.com
Cc: r-help@r-project.org
Sent: Tue, October 5, 2010 7:25:52 AM
Subject: Re: [R] read columns of quoted numbers as factors


On Oct 4, 2010, at 18:39 , james hirschorn wrote:

 Suppose I have a data file (possibly with a huge number of columns), where 
 the 

 columns with factors are coded as 1, 2, 3, etc ... The default behavior 
of 

 read.table is to convert these columns to integer vectors. 
 
 Is there a way to get read.table to recognize that columns of quoted numbers 
 represent factors (while unquoted numbers are interpreted as integers), 
 without 

 explicitly setting them with colClasses ?

I don't think there's a simple way, because the modus operandi of read.table is 
to read everything as character and then see whether it can be converted to 
numeric, and at that point any quotes will have been lost.

One possibility, somewhat dependent on the exact file format, would be to 
temporarily set quote=, see which columns contains quote characters, and, on 
a 
second pass, read those columns as factors, using  a computed colClasses 
argument. It will break down if you have space-separated columns with quoted 
multi-word strings, though.


 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read columns of quoted numbers as factors

2010-10-05 Thread David Winsemius


On Oct 5, 2010, at 8:41 PM, james hirschorn wrote:

Yes, your solution of setting quote= would read the multi-word  
strings
incorrectly. A more complicated version of your solution should  
work: First
check which columns are identified as strings, and then apply your  
solution to

the remaining columns.

I'm a newbie at R, but it seems to me that there is a logical  
inconsistency in
R: write.table puts quotes around numbers when they form a column of  
factors,

but does not put quotes for a column of integers.


Factors are internally represented as positive integers, but have a  
separate layer of their levels and labels. What I suspect you are  
seeing and calling numbers are the character-valued labels.


 write.table(data.frame(nums=-1:-5, facs= factor(-1:-5)), file=,  
row.names=F)

nums facs
-1 -1
-2 -2
-3 -3
-4 -4
-5 -5

That does not seem at all logically inconsistent to me.

--
David.


Since read.table is the dual
of write.table it seems that it should treat quoted and unquoted  
columns
differently, analogously to write.table. However, there does not  
even seem to be

an option to make read.table behave analogously.


- Original Message 
From: peter dalgaard pda...@gmail.com
To: james hirschorn j_hirsch...@yahoo.com
Cc: r-help@r-project.org
Sent: Tue, October 5, 2010 7:25:52 AM
Subject: Re: [R] read columns of quoted numbers as factors


On Oct 4, 2010, at 18:39 , james hirschorn wrote:

Suppose I have a data file (possibly with a huge number of  
columns), where the


columns with factors are coded as 1, 2, 3, etc ... The  
default behavior

of

read.table is to convert these columns to integer vectors.

Is there a way to get read.table to recognize that columns of  
quoted numbers
represent factors (while unquoted numbers are interpreted as  
integers), without


explicitly setting them with colClasses ?


I don't think there's a simple way, because the modus operandi of  
read.table is
to read everything as character and then see whether it can be  
converted to

numeric, and at that point any quotes will have been lost.

One possibility, somewhat dependent on the exact file format, would  
be to
temporarily set quote=, see which columns contains quote  
characters, and, on a
second pass, read those columns as factors, using  a computed  
colClasses
argument. It will break down if you have space-separated columns  
with quoted

multi-word strings, though.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read columns of quoted numbers as factors

2010-10-05 Thread Gabor Grothendieck
On Mon, Oct 4, 2010 at 12:39 PM, james hirschorn j_hirsch...@yahoo.com wrote:
 Suppose I have a data file (possibly with a huge number of columns), where the
 columns with factors are coded as 1, 2, 3, etc ... The default behavior 
 of
 read.table is to convert these columns to integer vectors.

 Is there a way to get read.table to recognize that columns of quoted numbers
 represent factors (while unquoted numbers are interpreted as integers), 
 without
 explicitly setting them with colClasses ?

Although its a bit messy its nevertheless only a few lines of code to
transform the quote-and-digit columns to non-numeric, read them in and
transform back. For example, if ! does not appear in the file we could
insert ! characters into the quote-and-digit columns and remove them
afterwards:

L - readLines(myfile.dat)
L2 - gsub('(\\d+)', !\\1, L) # insert !
DF - read.table(textConnection(L2), header = TRUE)

# remove !
ix - sapply(DF, is.factor)
DF[ix] - lapply(DF[ix], function(x) factor(gsub(!, , x)))

str(DF)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read columns of quoted numbers as factors

2010-10-04 Thread james hirschorn
Suppose I have a data file (possibly with a huge number of columns), where the 
columns with factors are coded as 1, 2, 3, etc ... The default behavior 
of 
read.table is to convert these columns to integer vectors. 

Is there a way to get read.table to recognize that columns of quoted numbers 
represent factors (while unquoted numbers are interpreted as integers), without 
explicitly setting them with colClasses ?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.