Re: [R] read columns of quoted numbers as factors
On 10/06/2010 02:41 AM, james hirschorn wrote: Yes, your solution of setting quote= would read the multi-word strings incorrectly. A more complicated version of your solution should work: First check which columns are identified as strings, and then apply your solution to the remaining columns. Probably more painful than that if column separators can appear in strings. The best I can think of involves trying to reread the columns that get classified as numeric with colClasses=numeric and see if they fail. A general solution likely requires changing scan() at C-level. I'm a newbie at R, but it seems to me that there is a logical inconsistency in R: write.table puts quotes around numbers when they form a column of factors, but does not put quotes for a column of integers. Since read.table is the dual of write.table it seems that it should treat quoted and unquoted columns differently, analogously to write.table. However, there does not even seem to be an option to make read.table behave analogously. Yes, and far from the only such case in R. (Even more annoying to my eyes is that factor levels get reordered alphabetically, so write.table is really not an option for storage of data frames anyway). However, the quoting of factor levels on output from write.table is not happening to distinguish numbers from character strings. Rather, it is for potentially multi-word level names. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read columns of quoted numbers as factors
On Mon, 2010-10-04 at 09:39 -0700, james hirschorn wrote: Suppose I have a data file (possibly with a huge number of columns), where the columns with factors are coded as 1, 2, 3, etc ... The default behavior of read.table is to convert these columns to integer vectors. Is there a way to get read.table to recognize that columns of quoted numbers represent factors (while unquoted numbers are interpreted as integers), without explicitly setting them with colClasses ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi James, I think you solve ypur problem using the options colClasses in the read.table command, something like this rea.table('name.of.table',colClasses=c(rep(30,'integer'),rep(5,'numeric'),etc)) -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read columns of quoted numbers as factors
On Oct 4, 2010, at 18:39 , james hirschorn wrote: Suppose I have a data file (possibly with a huge number of columns), where the columns with factors are coded as 1, 2, 3, etc ... The default behavior of read.table is to convert these columns to integer vectors. Is there a way to get read.table to recognize that columns of quoted numbers represent factors (while unquoted numbers are interpreted as integers), without explicitly setting them with colClasses ? I don't think there's a simple way, because the modus operandi of read.table is to read everything as character and then see whether it can be converted to numeric, and at that point any quotes will have been lost. One possibility, somewhat dependent on the exact file format, would be to temporarily set quote=, see which columns contains quote characters, and, on a second pass, read those columns as factors, using a computed colClasses argument. It will break down if you have space-separated columns with quoted multi-word strings, though. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read columns of quoted numbers as factors
From: pda...@gmail.com Date: Tue, 5 Oct 2010 13:25:52 +0200 To: j_hirsch...@yahoo.com CC: r-help@r-project.org Subject: Re: [R] read columns of quoted numbers as factors On Oct 4, 2010, at 18:39 , james hirschorn wrote: Suppose I have a data file (possibly with a huge number of columns), where the columns with factors are coded as 1, 2, 3, etc ... The default behavior of read.table is to convert these columns to integer vectors. Is there a way to get read.table to recognize that columns of quoted numbers represent factors (while unquoted numbers are interpreted as integers), without explicitly setting them with colClasses ? I don't think there's a simple way, because the modus operandi of read.table is to read everything as character and then see whether it can be converted to numeric, and at that point any quotes will have been lost. One possibility, somewhat dependent on the exact file format, would be to temporarily set quote=, see which columns contains quote characters, and, on a second pass, read those columns as factors, using a computed colClasses argument. It will break down if you have space-separated columns with quoted multi-word strings, though. While this specific example may or may not lend itself to a solution within R, I would just mention that it is not a faux pas to modify your data file with something like sed or awk prior to feeding it to some program like R. Quotes,spaces, commas, etc, may be something that the target app can handle or it may just be easier to change the format with a familiar tool designed for that. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read columns of quoted numbers as factors
Yes, your solution of setting quote= would read the multi-word strings incorrectly. A more complicated version of your solution should work: First check which columns are identified as strings, and then apply your solution to the remaining columns. I'm a newbie at R, but it seems to me that there is a logical inconsistency in R: write.table puts quotes around numbers when they form a column of factors, but does not put quotes for a column of integers. Since read.table is the dual of write.table it seems that it should treat quoted and unquoted columns differently, analogously to write.table. However, there does not even seem to be an option to make read.table behave analogously. - Original Message From: peter dalgaard pda...@gmail.com To: james hirschorn j_hirsch...@yahoo.com Cc: r-help@r-project.org Sent: Tue, October 5, 2010 7:25:52 AM Subject: Re: [R] read columns of quoted numbers as factors On Oct 4, 2010, at 18:39 , james hirschorn wrote: Suppose I have a data file (possibly with a huge number of columns), where the columns with factors are coded as 1, 2, 3, etc ... The default behavior of read.table is to convert these columns to integer vectors. Is there a way to get read.table to recognize that columns of quoted numbers represent factors (while unquoted numbers are interpreted as integers), without explicitly setting them with colClasses ? I don't think there's a simple way, because the modus operandi of read.table is to read everything as character and then see whether it can be converted to numeric, and at that point any quotes will have been lost. One possibility, somewhat dependent on the exact file format, would be to temporarily set quote=, see which columns contains quote characters, and, on a second pass, read those columns as factors, using a computed colClasses argument. It will break down if you have space-separated columns with quoted multi-word strings, though. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read columns of quoted numbers as factors
On Oct 5, 2010, at 8:41 PM, james hirschorn wrote: Yes, your solution of setting quote= would read the multi-word strings incorrectly. A more complicated version of your solution should work: First check which columns are identified as strings, and then apply your solution to the remaining columns. I'm a newbie at R, but it seems to me that there is a logical inconsistency in R: write.table puts quotes around numbers when they form a column of factors, but does not put quotes for a column of integers. Factors are internally represented as positive integers, but have a separate layer of their levels and labels. What I suspect you are seeing and calling numbers are the character-valued labels. write.table(data.frame(nums=-1:-5, facs= factor(-1:-5)), file=, row.names=F) nums facs -1 -1 -2 -2 -3 -3 -4 -4 -5 -5 That does not seem at all logically inconsistent to me. -- David. Since read.table is the dual of write.table it seems that it should treat quoted and unquoted columns differently, analogously to write.table. However, there does not even seem to be an option to make read.table behave analogously. - Original Message From: peter dalgaard pda...@gmail.com To: james hirschorn j_hirsch...@yahoo.com Cc: r-help@r-project.org Sent: Tue, October 5, 2010 7:25:52 AM Subject: Re: [R] read columns of quoted numbers as factors On Oct 4, 2010, at 18:39 , james hirschorn wrote: Suppose I have a data file (possibly with a huge number of columns), where the columns with factors are coded as 1, 2, 3, etc ... The default behavior of read.table is to convert these columns to integer vectors. Is there a way to get read.table to recognize that columns of quoted numbers represent factors (while unquoted numbers are interpreted as integers), without explicitly setting them with colClasses ? I don't think there's a simple way, because the modus operandi of read.table is to read everything as character and then see whether it can be converted to numeric, and at that point any quotes will have been lost. One possibility, somewhat dependent on the exact file format, would be to temporarily set quote=, see which columns contains quote characters, and, on a second pass, read those columns as factors, using a computed colClasses argument. It will break down if you have space-separated columns with quoted multi-word strings, though. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read columns of quoted numbers as factors
On Mon, Oct 4, 2010 at 12:39 PM, james hirschorn j_hirsch...@yahoo.com wrote: Suppose I have a data file (possibly with a huge number of columns), where the columns with factors are coded as 1, 2, 3, etc ... The default behavior of read.table is to convert these columns to integer vectors. Is there a way to get read.table to recognize that columns of quoted numbers represent factors (while unquoted numbers are interpreted as integers), without explicitly setting them with colClasses ? Although its a bit messy its nevertheless only a few lines of code to transform the quote-and-digit columns to non-numeric, read them in and transform back. For example, if ! does not appear in the file we could insert ! characters into the quote-and-digit columns and remove them afterwards: L - readLines(myfile.dat) L2 - gsub('(\\d+)', !\\1, L) # insert ! DF - read.table(textConnection(L2), header = TRUE) # remove ! ix - sapply(DF, is.factor) DF[ix] - lapply(DF[ix], function(x) factor(gsub(!, , x))) str(DF) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read columns of quoted numbers as factors
Suppose I have a data file (possibly with a huge number of columns), where the columns with factors are coded as 1, 2, 3, etc ... The default behavior of read.table is to convert these columns to integer vectors. Is there a way to get read.table to recognize that columns of quoted numbers represent factors (while unquoted numbers are interpreted as integers), without explicitly setting them with colClasses ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.