Re: [R] The behaviour of read.csv().
On 03/12/2010 7:08 AM, Duncan Murdoch wrote: On 02/12/2010 9:59 PM, Rolf Turner wrote: On 3/12/2010, at 3:48 PM, David Scott wrote: On 03/12/10 14:33, Duncan Murdoch wrote: SNIP I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0. The comment in the NEWS file suggests it was in response to some strange csv file coming out of Excel. The real problem with the CSV format is that there really isn't a well defined standard for it. The first RFC about it was published in 2005, and it doesn't claim to be authoritative. Excel is kind of a standard, but it does some very weird things. (For example: enter the string 01 into a field. To keep the leading 0, you need to type it as '01. Save the file, read it back: goodbye 0. At least that's what a website I was just on says about Excel, and what OpenOffice does.) I've been burned so many times by storing data in .csv files, that I just avoid them whenever I can. Absolutely agree with this Duncan. Playing around with .csv files is like playing with some sort of unstable explosive. I also avoid them as much as possible. Where I work, everybody but me uses (yeuuccchhh!!!) Excel or SPSS. If we are to share data sets, *.csv files seem to be the most efficacious, if not the only, way to go. I was going to suggest using DIF rather than CSV. It contains more internal information about the file (including the type of each entry), but has the disadvantage of being less readable, even though it is ascii. However, in putting together a little demo, I found a couple of bugs in the R implementation of read.DIF, and it looks as though it ignores the internal type information. Sigh. As of r53778, the bugs I noticed should be fixed. read.DIF now respects the internal type information, so it will keep character strings like 001 as type character (unless you ask it to change the type). Duncan Murdoch Duncan Murdoch So far, we've had very few problems. The one that started off this thread is the only one I can think of that related to the *.csv format. At least *.csv files have the virtue of being ASCII files, whence if things go wrong it is at least possible to dig into them with a text editor and figure out just what the problem is. cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 6/12/2010, at 3:00 AM, Duncan Murdoch wrote: I was going to suggest using DIF rather than CSV. It contains more internal information about the file (including the type of each entry), but has the disadvantage of being less readable, even though it is ascii. I don't think DIF is really the answer. My colleagues are familiar with the *.csv concept; they have never heard of ``DIF''. As I have said, we have had but few problems using *.csv. Better the devil you know ... Furthermore I have to deal with data provided by various sources ``external'' to the research project that I work for. I have to use the data that these sources provide, in the format in which they provide it. If they give me *.csv files I count myself lucky. Finally, there seems to be no ``write.DIF'' function, i.e. there is no way to produce *.DIF output, as far as I can tell. Hence it would not seem practical to use *.DIF as a data exchange standard. However, in putting together a little demo, I found a couple of bugs in the R implementation of read.DIF, and it looks as though it ignores the internal type information. Sigh. As of r53778, the bugs I noticed should be fixed. read.DIF now respects the internal type information, so it will keep character strings like 001 as type character (unless you ask it to change the type). What does ``r53778'' mean? cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On Dec 5, 2010, at 2:14 PM, Rolf Turner wrote: On 6/12/2010, at 3:00 AM, Duncan Murdoch wrote: As of r53778, the bugs I noticed should be fixed. read.DIF now respects the internal type information, so it will keep character strings like 001 as type character (unless you ask it to change the type). What does ``r53778'' mean? I assumed it was a version sequence number: http://cran.r-project.org/src/base-prerelease/ cheers, Rolf David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 05/12/2010 2:14 PM, Rolf Turner wrote: On 6/12/2010, at 3:00 AM, Duncan Murdoch wrote: I was going to suggest using DIF rather than CSV. It contains more internal information about the file (including the type of each entry), but has the disadvantage of being less readable, even though it is ascii. I don't think DIF is really the answer. My colleagues are familiar with the *.csv concept; they have never heard of ``DIF''. As I have said, we have had but few problems using *.csv. Better the devil you know ... Furthermore I have to deal with data provided by various sources ``external'' to the research project that I work for. I have to use the data that these sources provide, in the format in which they provide it. If they give me *.csv files I count myself lucky. Finally, there seems to be no ``write.DIF'' function, i.e. there is no way to produce *.DIF output, as far as I can tell. Hence it would not seem practical to use *.DIF as a data exchange standard. Sure, those are good points. However, in putting together a little demo, I found a couple of bugs in the R implementation of read.DIF, and it looks as though it ignores the internal type information. Sigh. As of r53778, the bugs I noticed should be fixed. read.DIF now respects the internal type information, so it will keep character strings like 001 as type character (unless you ask it to change the type). What does ``r53778'' mean? Revision 53778 from the version control system. When you start R-patched or R-devel it will print this in the startup message, e.g. R version 2.13.0 Under development (unstable) (2010-12-05 r53775) ^^ (from just before I saved the changes). Duncan Murdoch cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 02/12/2010 9:59 PM, Rolf Turner wrote: On 3/12/2010, at 3:48 PM, David Scott wrote: On 03/12/10 14:33, Duncan Murdoch wrote: SNIP I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0. The comment in the NEWS file suggests it was in response to some strange csv file coming out of Excel. The real problem with the CSV format is that there really isn't a well defined standard for it. The first RFC about it was published in 2005, and it doesn't claim to be authoritative. Excel is kind of a standard, but it does some very weird things. (For example: enter the string 01 into a field. To keep the leading 0, you need to type it as '01. Save the file, read it back: goodbye 0. At least that's what a website I was just on says about Excel, and what OpenOffice does.) I've been burned so many times by storing data in .csv files, that I just avoid them whenever I can. Absolutely agree with this Duncan. Playing around with .csv files is like playing with some sort of unstable explosive. I also avoid them as much as possible. Where I work, everybody but me uses (yeuuccchhh!!!) Excel or SPSS. If we are to share data sets, *.csv files seem to be the most efficacious, if not the only, way to go. I was going to suggest using DIF rather than CSV. It contains more internal information about the file (including the type of each entry), but has the disadvantage of being less readable, even though it is ascii. However, in putting together a little demo, I found a couple of bugs in the R implementation of read.DIF, and it looks as though it ignores the internal type information. Sigh. Duncan Murdoch So far, we've had very few problems. The one that started off this thread is the only one I can think of that related to the *.csv format. At least *.csv files have the virtue of being ASCII files, whence if things go wrong it is at least possible to dig into them with a text editor and figure out just what the problem is. cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] The behaviour of read.csv().
I have recently been bitten by an aspect of the behaviour of the read.csv() function. Some lines in a (fairly large) *.csv file that I read in had too many entries. I would have hoped that this would cause read.csv() to throw an error, or at least issue a warning, but it read the file without complaint, putting the extra entries into an additional line. This behaviour is illustrated by the toy example in the attached file ``junk.csv''. Just do junk - read.csv(junk.csv,header=TRUE) junk to see the problem. If the offending over-long line were in the fourth line of data or earlier, an error would be thrown, but if it is in the fifth line of data or later no error is given. This is in a way compatible with what the help on read.csv() says: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. However, the help for read.table() says the same thing. And yet if one does gorp - read.table(junk.csv,sep=,,header=TRUE) one gets an error, whereas read.csv() gives none. Am I correct in saying that is inappropriate behaviour on the part of read.csv(), or am I missing something? cheers, Rolf Turner P. S.: sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_NZ.UTF-8/en_NZ.UTF-8/C/C/en_NZ.UTF-8/en_NZ.UTF-8 attached base packages: [1] datasets utils stats graphics grDevices methods base other attached packages: [1] misc_0.0-13 gtools_2.6.2spatstat_1.21-2 deldir_0.0-13 [5] mgcv_1.6-2 fortunes_1.4-0 MASS_7.3-8 loaded via a namespace (and not attached): [1] grid_2.12.0lattice_0.19-13Matrix_0.999375-44 nlme_3.1-97 [5] tools_2.12.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
Rolf - I'd suggest using junk - read.csv(junk.csv,header=TRUE,fill=FALSE) if you don't want the behaviour you're seeing. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 3 Dec 2010, Rolf Turner wrote: I have recently been bitten by an aspect of the behaviour of the read.csv() function. Some lines in a (fairly large) *.csv file that I read in had too many entries. I would have hoped that this would cause read.csv() to throw an error, or at least issue a warning, but it read the file without complaint, putting the extra entries into an additional line. This behaviour is illustrated by the toy example in the attached file ``junk.csv''. Just do junk - read.csv(junk.csv,header=TRUE) junk to see the problem. If the offending over-long line were in the fourth line of data or earlier, an error would be thrown, but if it is in the fifth line of data or later no error is given. This is in a way compatible with what the help on read.csv() says: The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. However, the help for read.table() says the same thing. And yet if one does gorp - read.table(junk.csv,sep=,,header=TRUE) one gets an error, whereas read.csv() gives none. Am I correct in saying that is inappropriate behaviour on the part of read.csv(), or am I missing something? cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 3/12/2010, at 1:08 PM, Phil Spector wrote: Rolf - I'd suggest using junk - read.csv(junk.csv,header=TRUE,fill=FALSE) if you don't want the behaviour you're seeing. The point is not that I don't want this kind of behaviour. The point is that it seems to me to be unexpected and dangerous. I can indeed take precautions against it, now that I know about it, by specifying fill=FALSE. Given that I remember to do so. Now that you've pointed it out I can see that this is the reason for the different behaviour between read.table() and read.csv(); in read.table() fill=FALSE is effectively the default. Having fill=TRUE being the default in read.csv() strikes me as being counter-intuitive and dangerous. cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 2010-12-02 16:26, Rolf Turner wrote: On 3/12/2010, at 1:08 PM, Phil Spector wrote: Rolf - I'd suggest using junk- read.csv(junk.csv,header=TRUE,fill=FALSE) if you don't want the behaviour you're seeing. The point is not that I don't want this kind of behaviour. The point is that it seems to me to be unexpected and dangerous. I can indeed take precautions against it, now that I know about it, by specifying fill=FALSE. Given that I remember to do so. Now that you've pointed it out I can see that this is the reason for the different behaviour between read.table() and read.csv(); in read.table() fill=FALSE is effectively the default. Having fill=TRUE being the default in read.csv() strikes me as being counter-intuitive and dangerous. Rolf, This is not to argue with your point re counter-intuitive, but I always run a count.fields() first if I haven't seen (or can't easily see) the file in my editor. I must have learned that the hard way a long time ago. Peter Ehlers cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 02/12/2010 8:04 PM, Peter Ehlers wrote: On 2010-12-02 16:26, Rolf Turner wrote: On 3/12/2010, at 1:08 PM, Phil Spector wrote: Rolf - I'd suggest using junk- read.csv(junk.csv,header=TRUE,fill=FALSE) if you don't want the behaviour you're seeing. The point is not that I don't want this kind of behaviour. The point is that it seems to me to be unexpected and dangerous. I can indeed take precautions against it, now that I know about it, by specifying fill=FALSE. Given that I remember to do so. Now that you've pointed it out I can see that this is the reason for the different behaviour between read.table() and read.csv(); in read.table() fill=FALSE is effectively the default. Having fill=TRUE being the default in read.csv() strikes me as being counter-intuitive and dangerous. Rolf, This is not to argue with your point re counter-intuitive, but I always run a count.fields() first if I haven't seen (or can't easily see) the file in my editor. I must have learned that the hard way a long time ago. I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0. The comment in the NEWS file suggests it was in response to some strange csv file coming out of Excel. The real problem with the CSV format is that there really isn't a well defined standard for it. The first RFC about it was published in 2005, and it doesn't claim to be authoritative. Excel is kind of a standard, but it does some very weird things. (For example: enter the string 01 into a field. To keep the leading 0, you need to type it as '01. Save the file, read it back: goodbye 0. At least that's what a website I was just on says about Excel, and what OpenOffice does.) I've been burned so many times by storing data in .csv files, that I just avoid them whenever I can. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 3/12/2010, at 2:04 PM, Peter Ehlers wrote: SNIP Rolf, This is not to argue with your point re counter-intuitive, but I always run a count.fields() first if I haven't seen (or can't easily see) the file in my editor. I must have learned that the hard way a long time ago. Sound advice! Thanks. I'd just like to point out however that it might be an idea to set quote=\ in the call to count.fields() --- to make its idea of how many fields there are consistent with that of read.csv(). In count.fields() quote defaults to \' whereas in read.csv() it defaults to \. cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On Dec 2, 2010, at 8:33 PM, Duncan Murdoch wrote: snipped I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0. The comment in the NEWS file suggests it was in response to some strange csv file coming out of Excel. The real problem with the CSV format is that there really isn't a well defined standard for it. The first RFC about it was published in 2005, and it doesn't claim to be authoritative. Excel is kind of a standard, but it does some very weird things. (For example: enter the string 01 into a field. To keep the leading 0, you need to type it as '01. Save the file, read it back: goodbye 0. At least that's what a website I was just on says about Excel, and what OpenOffice does.) In both Excel and in OO,org you can select a column (or any other range) and set its format to text. (The default is numeric, not that different that read.table()'s default behavior.) Once a format has been set, you then do not need leading quotes. I just created a small example with OO.org Calc entered leading 0 without leading quotes and this code runs as desired after copying the three cells to the clipboard: read.table(pipe(pbpaste), colClasses=character) V1 1 01 2 004 3 0005 The same applies to date field in both OO.org and Excel. In this regard, it is simply a matter of understanding what is the defined behavior of your software and how one can manipulate it. This is no different than learning R's classes, coercing them to your ends, and dealing with other formatting issues. I've been burned so many times by storing data in .csv files, that I just avoid them whenever I can. No argument there. I know one physician whose weapon of choice is Stata who always uses | as his separator, but that's perhaps because he works entirely in Windows. I imagine that might not be the most uncommon character in *NIXen. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 02/12/2010 9:18 PM, David Winsemius wrote: On Dec 2, 2010, at 8:33 PM, Duncan Murdoch wrote: snipped I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0. The comment in the NEWS file suggests it was in response to some strange csv file coming out of Excel. The real problem with the CSV format is that there really isn't a well defined standard for it. The first RFC about it was published in 2005, and it doesn't claim to be authoritative. Excel is kind of a standard, but it does some very weird things. (For example: enter the string 01 into a field. To keep the leading 0, you need to type it as '01. Save the file, read it back: goodbye 0. At least that's what a website I was just on says about Excel, and what OpenOffice does.) In both Excel and in OO,org you can select a column (or any other range) and set its format to text. (The default is numeric, not that different that read.table()'s default behavior.) Once a format has been set, you then do not need leading quotes. I just created a small example with OO.org Calc entered leading 0 without leading quotes and this code runs as desired after copying the three cells to the clipboard: read.table(pipe(pbpaste), colClasses=character) V1 1 01 2 004 3 0005 The same applies to date field in both OO.org and Excel. In this regard, it is simply a matter of understanding what is the defined behavior of your software and how one can manipulate it. This is no different than learning R's classes, coercing them to your ends, and dealing with other formatting issues. You're right, I shouldn't have picked on Excel particularly here, but it really is a bizarre format that says the default way to read a file containing V1 01 004 0005 is to assume that the column contains numeric values. (Yes, read.csv() makes this same assumption.) My main complaint is with the format. Duncan Murdoch I've been burned so many times by storing data in .csv files, that I just avoid them whenever I can. No argument there. I know one physician whose weapon of choice is Stata who always uses | as his separator, but that's perhaps because he works entirely in Windows. I imagine that might not be the most uncommon character in *NIXen. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 03/12/10 14:33, Duncan Murdoch wrote: On 02/12/2010 8:04 PM, Peter Ehlers wrote: On 2010-12-02 16:26, Rolf Turner wrote: On 3/12/2010, at 1:08 PM, Phil Spector wrote: Rolf - I'd suggest using junk- read.csv(junk.csv,header=TRUE,fill=FALSE) if you don't want the behaviour you're seeing. The point is not that I don't want this kind of behaviour. The point is that it seems to me to be unexpected and dangerous. I can indeed take precautions against it, now that I know about it, by specifying fill=FALSE. Given that I remember to do so. Now that you've pointed it out I can see that this is the reason for the different behaviour between read.table() and read.csv(); in read.table() fill=FALSE is effectively the default. Having fill=TRUE being the default in read.csv() strikes me as being counter-intuitive and dangerous. Rolf, This is not to argue with your point re counter-intuitive, but I always run a count.fields() first if I haven't seen (or can't easily see) the file in my editor. I must have learned that the hard way a long time ago. I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0. The comment in the NEWS file suggests it was in response to some strange csv file coming out of Excel. The real problem with the CSV format is that there really isn't a well defined standard for it. The first RFC about it was published in 2005, and it doesn't claim to be authoritative. Excel is kind of a standard, but it does some very weird things. (For example: enter the string 01 into a field. To keep the leading 0, you need to type it as '01. Save the file, read it back: goodbye 0. At least that's what a website I was just on says about Excel, and what OpenOffice does.) I've been burned so many times by storing data in .csv files, that I just avoid them whenever I can. Absolutely agree with this Duncan. Playing around with .csv files is like playing with some sort of unstable explosive. I also avoid them as much as possible. David Scott Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 Director of Consulting, Department of Statistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On 3/12/2010, at 3:48 PM, David Scott wrote: On 03/12/10 14:33, Duncan Murdoch wrote: SNIP I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0. The comment in the NEWS file suggests it was in response to some strange csv file coming out of Excel. The real problem with the CSV format is that there really isn't a well defined standard for it. The first RFC about it was published in 2005, and it doesn't claim to be authoritative. Excel is kind of a standard, but it does some very weird things. (For example: enter the string 01 into a field. To keep the leading 0, you need to type it as '01. Save the file, read it back: goodbye 0. At least that's what a website I was just on says about Excel, and what OpenOffice does.) I've been burned so many times by storing data in .csv files, that I just avoid them whenever I can. Absolutely agree with this Duncan. Playing around with .csv files is like playing with some sort of unstable explosive. I also avoid them as much as possible. Where I work, everybody but me uses (yeuuccchhh!!!) Excel or SPSS. If we are to share data sets, *.csv files seem to be the most efficacious, if not the only, way to go. So far, we've had very few problems. The one that started off this thread is the only one I can think of that related to the *.csv format. At least *.csv files have the virtue of being ASCII files, whence if things go wrong it is at least possible to dig into them with a text editor and figure out just what the problem is. cheers, Rolf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The behaviour of read.csv().
On Dec 2, 2010, at 9:33 PM, Duncan Murdoch wrote: On 02/12/2010 9:18 PM, David Winsemius wrote: On Dec 2, 2010, at 8:33 PM, Duncan Murdoch wrote: snipped I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0. The comment in the NEWS file suggests it was in response to some strange csv file coming out of Excel. The real problem with the CSV format is that there really isn't a well defined standard for it. The first RFC about it was published in 2005, and it doesn't claim to be authoritative. Excel is kind of a standard, but it does some very weird things. (For example: enter the string 01 into a field. To keep the leading 0, you need to type it as '01. Save the file, read it back: goodbye 0. At least that's what a website I was just on says about Excel, and what OpenOffice does.) In both Excel and in OO,org you can select a column (or any other range) and set its format to text. (The default is numeric, not that different that read.table()'s default behavior.) Once a format has been set, you then do not need leading quotes. I just created a small example with OO.org Calc entered leading 0 without leading quotes and this code runs as desired after copying the three cells to the clipboard: read.table(pipe(pbpaste), colClasses=character) V1 1 01 2 004 3 0005 The same applies to date field in both OO.org and Excel. In this regard, it is simply a matter of understanding what is the defined behavior of your software and how one can manipulate it. This is no different than learning R's classes, coercing them to your ends, and dealing with other formatting issues. You're right, I shouldn't have picked on Excel particularly here, but it really is a bizarre format that says the default way to read a file containing V1 # minor quibble. The V1 was added by read.table() 01 004 0005 is to assume that the column contains numeric values. I'm a bit puzzled. Or maybe not. If you are criticizing the default behavior of R's read.table then I do understand (but have been taught by my reading of the FM that numeric happens iff all first n _are_ coercible to numeric without NA generation is what one should expect). Excel is offering text exactly in the instances it has been told that the cell format is text. (Yes, read.csv() makes this same assumption.) My main complaint is with the format. Meaning the defaults chosen for read.csv()? -- David. Duncan Murdoch I've been burned so many times by storing data in .csv files, that I just avoid them whenever I can. No argument there. I know one physician whose weapon of choice is Stata who always uses | as his separator, but that's perhaps because he works entirely in Windows. I imagine that might not be the most uncommon character in *NIXen. -- David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.