Re: [R] why must a named colClasses in read.table be in correct order
Thanks for insisting; I was wrong and I'm happy to see that there is indeed code intended for named 'colClasses', which even goes back to 2004. But as you report, then names only work when length(colClasses) cols (which also explains why I though it was not supported). I'm not sure if that _strictly less than_ test is intentional or a mistake, but I would propose the following patch: [HB-X201]{hb}: svn diff src\library\utils\R\readtable.R Index: src/library/utils/R/readtable.R === --- src/library/utils/R/readtable.R (revision 68642) +++ src/library/utils/R/readtable.R (working copy) @@ -139,7 +139,7 @@ if (rlabp) col.names - c(row.names, col.names) nmColClasses - names(colClasses) -if(length(colClasses) cols) +if(length(colClasses) = cols) if(is.null(nmColClasses)) { colClasses - rep_len(colClasses, cols) } else { Your example works with this patch. I've made it source():able so you can try it out (if you cannot source() https://, then download the file an source it locally): source(https://gist.githubusercontent.com/HenrikBengtsson/ed1eeb41a1b4d6c43b47/raw/ebe58f76e518dd014423bea466a5c93d2efd3c99/readtable-fix.R;) kkk - c(a\tb, 3.14\tx) colClasses - c(a=numeric, b=character) data - read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = colClasses) str(data) ### 'data.frame': 1 obs. of 2 variables: ### $ a: num 3.14 ### $ b: chr x ## Does not work with utils::read.table(), but with patch data - read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = rev(colClasses)) str(data) ### 'data.frame': 1 obs. of 2 variables: ### $ a: num 3.14 ### $ b: chr x Let's hope that the above is a (10-year old) typo, and changing a to a = adds support for named 'colClasses', which is a really useful functionality. /Henrik On Wed, Jul 8, 2015 at 6:42 PM, Andreas Leha andreas.l...@med.uni-goettingen.de wrote: Hi Henrik, Thanks for your reply. I am not (yet) convinced, though. The help page for read.table mentions named colClasses and if I specify colClasses for not all columns, the names are taken into account: --8---cut here---start-8--- kkk - c(a\tb, 3.14\tx) str(read.table(textConnection(kkk), sep=\t, header = TRUE)) str(read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses=c(b=character))) --8---cut here---end---8--- What am I missing? Best, Andreas On 09/07/2015 02:21, Henrik Bengtsson wrote: read.table() does not make use of names(colClasses) - only its values. Because of this, ordering is critical, as you noted. It shouldn't be too hard to add support for a named `colClasses` argument of utils::read.table(), but someone needs to convince the R core team that this is a good idea. As an alternative, see R.filesets::readDataFrame() for a read.table()-like function that matches names(colClasses) to column names, if they exists. /Henrik (author of R.filesets) On Wed, Jul 8, 2015 at 5:41 PM, Andreas Leha andreas.l...@med.uni-goettingen.de wrote: Hi all, Apparently, the colClasses argument to read.table needs to be in the order of the columns *even when it is named*. Why is that? And where would I find it in the documentation? Here is a MWE: --8---cut here---start-8--- kkk - c(a\tb, 3.14\tx) read.table(textConnection(kkk), sep=\t, header = TRUE) cclasses=c(b=character, a=numeric) read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = cclasses) ## --- error read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = cclasses[order(names(cclasses))]) --8---cut here---end---8--- Thanks, Andreas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why must a named colClasses in read.table be in correct order
Hi Henrik, Thanks for your reply. I am not (yet) convinced, though. The help page for read.table mentions named colClasses and if I specify colClasses for not all columns, the names are taken into account: --8---cut here---start-8--- kkk - c(a\tb, 3.14\tx) str(read.table(textConnection(kkk), sep=\t, header = TRUE)) str(read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses=c(b=character))) --8---cut here---end---8--- What am I missing? Best, Andreas On 09/07/2015 02:21, Henrik Bengtsson wrote: read.table() does not make use of names(colClasses) - only its values. Because of this, ordering is critical, as you noted. It shouldn't be too hard to add support for a named `colClasses` argument of utils::read.table(), but someone needs to convince the R core team that this is a good idea. As an alternative, see R.filesets::readDataFrame() for a read.table()-like function that matches names(colClasses) to column names, if they exists. /Henrik (author of R.filesets) On Wed, Jul 8, 2015 at 5:41 PM, Andreas Leha andreas.l...@med.uni-goettingen.de wrote: Hi all, Apparently, the colClasses argument to read.table needs to be in the order of the columns *even when it is named*. Why is that? And where would I find it in the documentation? Here is a MWE: --8---cut here---start-8--- kkk - c(a\tb, 3.14\tx) read.table(textConnection(kkk), sep=\t, header = TRUE) cclasses=c(b=character, a=numeric) read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = cclasses) ## --- error read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = cclasses[order(names(cclasses))]) --8---cut here---end---8--- Thanks, Andreas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why must a named colClasses in read.table be in correct order
Hi Henrik, Thank you very much for looking into this. And thanks for the patch! Yes, let's hope this is a typo that gets fixed. Regards, Andreas Henrik Bengtsson henrik.bengts...@ucsf.edu writes: Thanks for insisting; I was wrong and I'm happy to see that there is indeed code intended for named 'colClasses', which even goes back to 2004. But as you report, then names only work when length(colClasses) cols (which also explains why I though it was not supported). I'm not sure if that _strictly less than_ test is intentional or a mistake, but I would propose the following patch: [HB-X201]{hb}: svn diff src\library\utils\R\readtable.R Index: src/library/utils/R/readtable.R === --- src/library/utils/R/readtable.R (revision 68642) +++ src/library/utils/R/readtable.R (working copy) @@ -139,7 +139,7 @@ if (rlabp) col.names - c(row.names, col.names) nmColClasses - names(colClasses) -if(length(colClasses) cols) +if(length(colClasses) = cols) if(is.null(nmColClasses)) { colClasses - rep_len(colClasses, cols) } else { Your example works with this patch. I've made it source():able so you can try it out (if you cannot source() https://, then download the file an source it locally): source(https://gist.githubusercontent.com/HenrikBengtsson/ed1eeb41a1b4d6c43b47/raw/ebe58f76e518dd014423bea466a5c93d2efd3c99/readtable-fix.R;) kkk - c(a\tb, 3.14\tx) colClasses - c(a=numeric, b=character) data - read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = colClasses) str(data) ### 'data.frame': 1 obs. of 2 variables: ### $ a: num 3.14 ### $ b: chr x ## Does not work with utils::read.table(), but with patch data - read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = rev(colClasses)) str(data) ### 'data.frame': 1 obs. of 2 variables: ### $ a: num 3.14 ### $ b: chr x Let's hope that the above is a (10-year old) typo, and changing a to a = adds support for named 'colClasses', which is a really useful functionality. /Henrik On Wed, Jul 8, 2015 at 6:42 PM, Andreas Leha andreas.l...@med.uni-goettingen.de wrote: Hi Henrik, Thanks for your reply. I am not (yet) convinced, though. The help page for read.table mentions named colClasses and if I specify colClasses for not all columns, the names are taken into account: --8---cut here---start-8--- kkk - c(a\tb, 3.14\tx) str(read.table(textConnection(kkk), sep=\t, header = TRUE)) str(read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses=c(b=character))) --8---cut here---end---8--- What am I missing? Best, Andreas On 09/07/2015 02:21, Henrik Bengtsson wrote: read.table() does not make use of names(colClasses) - only its values. Because of this, ordering is critical, as you noted. It shouldn't be too hard to add support for a named `colClasses` argument of utils::read.table(), but someone needs to convince the R core team that this is a good idea. As an alternative, see R.filesets::readDataFrame() for a read.table()-like function that matches names(colClasses) to column names, if they exists. /Henrik (author of R.filesets) On Wed, Jul 8, 2015 at 5:41 PM, Andreas Leha andreas.l...@med.uni-goettingen.de wrote: Hi all, Apparently, the colClasses argument to read.table needs to be in the order of the columns *even when it is named*. Why is that? And where would I find it in the documentation? Here is a MWE: --8---cut here---start-8--- kkk - c(a\tb, 3.14\tx) read.table(textConnection(kkk), sep=\t, header = TRUE) cclasses=c(b=character, a=numeric) read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = cclasses) ## --- error read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = cclasses[order(names(cclasses))]) --8---cut here---end---8--- Thanks, Andreas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide
[R] why must a named colClasses in read.table be in correct order
Hi all, Apparently, the colClasses argument to read.table needs to be in the order of the columns *even when it is named*. Why is that? And where would I find it in the documentation? Here is a MWE: --8---cut here---start-8--- kkk - c(a\tb, 3.14\tx) read.table(textConnection(kkk), sep=\t, header = TRUE) cclasses=c(b=character, a=numeric) read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = cclasses) ## --- error read.table(textConnection(kkk), sep=\t, header = TRUE, colClasses = cclasses[order(names(cclasses))]) --8---cut here---end---8--- Thanks, Andreas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.