Re: [R] why must a named colClasses in read.table be in correct order

2015-07-08 Thread Henrik Bengtsson
Thanks for insisting; I was wrong and I'm happy to see that there is
indeed code intended for named 'colClasses', which even goes back to
2004.   But as you report, then names only work when
length(colClasses)  cols (which also explains why I though it was not
supported).  I'm not sure if that _strictly less than_  test is
intentional or a mistake, but I would propose the following patch:

[HB-X201]{hb}: svn diff src\library\utils\R\readtable.R
Index: src/library/utils/R/readtable.R
===
--- src/library/utils/R/readtable.R (revision 68642)
+++ src/library/utils/R/readtable.R (working copy)
@@ -139,7 +139,7 @@
 if (rlabp) col.names - c(row.names, col.names)

 nmColClasses - names(colClasses)
-if(length(colClasses)  cols)
+if(length(colClasses) = cols)
 if(is.null(nmColClasses)) {
 colClasses - rep_len(colClasses, cols)
 } else {


Your example works with this patch.  I've made it source():able so you
can try it out (if you cannot source() https://, then download the
file an source it locally):

source(https://gist.githubusercontent.com/HenrikBengtsson/ed1eeb41a1b4d6c43b47/raw/ebe58f76e518dd014423bea466a5c93d2efd3c99/readtable-fix.R;)

kkk - c(a\tb,
 3.14\tx)

colClasses - c(a=numeric, b=character)
data - read.table(textConnection(kkk),
   sep=\t,
   header = TRUE,
   colClasses = colClasses)
str(data)
### 'data.frame':   1 obs. of  2 variables:
### $ a: num 3.14
### $ b: chr x

## Does not work with utils::read.table(), but with patch
data - read.table(textConnection(kkk),
   sep=\t,
   header = TRUE,
   colClasses = rev(colClasses))
str(data)
### 'data.frame':   1 obs. of  2 variables:
### $ a: num 3.14
### $ b: chr x

Let's hope that the above is a (10-year old) typo, and changing a  to
a = adds support for named 'colClasses', which is a really useful
functionality.

/Henrik

On Wed, Jul 8, 2015 at 6:42 PM, Andreas Leha
andreas.l...@med.uni-goettingen.de wrote:
 Hi Henrik,

 Thanks for your reply.

 I am not (yet) convinced, though.  The help page for read.table
 mentions named colClasses and if I specify colClasses for not all
 columns, the names are taken into account:

 --8---cut here---start-8---
 kkk - c(a\tb,
  3.14\tx)
 str(read.table(textConnection(kkk),
sep=\t,
header = TRUE))

 str(read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses=c(b=character)))
 --8---cut here---end---8---

 What am I missing?

 Best,
 Andreas



 On 09/07/2015 02:21, Henrik Bengtsson wrote:
 read.table() does not make use of names(colClasses) - only its values.
 Because of this, ordering is critical, as you noted. It shouldn't be
 too hard to add support for a named `colClasses` argument of
 utils::read.table(), but someone needs to convince the R core team
 that this is a good idea.

 As an alternative, see R.filesets::readDataFrame() for a
 read.table()-like function that matches names(colClasses) to column
 names, if they exists.

 /Henrik
 (author of R.filesets)

 On Wed, Jul 8, 2015 at 5:41 PM, Andreas Leha
 andreas.l...@med.uni-goettingen.de wrote:
 Hi all,

 Apparently, the colClasses argument to read.table needs to be in the
 order of the columns *even when it is named*.  Why is that?  And where
 would I find it in the documentation?

 Here is a MWE:

 --8---cut here---start-8---
 kkk - c(a\tb,
  3.14\tx)
 read.table(textConnection(kkk),
sep=\t,
header = TRUE)

 cclasses=c(b=character,
a=numeric)

 read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses = cclasses)  ## --- error

 read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses = cclasses[order(names(cclasses))])
 --8---cut here---end---8---


 Thanks,
 Andreas

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why must a named colClasses in read.table be in correct order

2015-07-08 Thread Andreas Leha
Hi Henrik,

Thanks for your reply.

I am not (yet) convinced, though.  The help page for read.table
mentions named colClasses and if I specify colClasses for not all
columns, the names are taken into account:

--8---cut here---start-8---
kkk - c(a\tb,
 3.14\tx)
str(read.table(textConnection(kkk),
   sep=\t,
   header = TRUE))

str(read.table(textConnection(kkk),
   sep=\t,
   header = TRUE,
   colClasses=c(b=character)))
--8---cut here---end---8---

What am I missing?

Best,
Andreas



On 09/07/2015 02:21, Henrik Bengtsson wrote:
 read.table() does not make use of names(colClasses) - only its values.
 Because of this, ordering is critical, as you noted. It shouldn't be
 too hard to add support for a named `colClasses` argument of
 utils::read.table(), but someone needs to convince the R core team
 that this is a good idea.
 
 As an alternative, see R.filesets::readDataFrame() for a
 read.table()-like function that matches names(colClasses) to column
 names, if they exists.
 
 /Henrik
 (author of R.filesets)
 
 On Wed, Jul 8, 2015 at 5:41 PM, Andreas Leha
 andreas.l...@med.uni-goettingen.de wrote:
 Hi all,

 Apparently, the colClasses argument to read.table needs to be in the
 order of the columns *even when it is named*.  Why is that?  And where
 would I find it in the documentation?

 Here is a MWE:

 --8---cut here---start-8---
 kkk - c(a\tb,
  3.14\tx)
 read.table(textConnection(kkk),
sep=\t,
header = TRUE)

 cclasses=c(b=character,
a=numeric)

 read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses = cclasses)  ## --- error

 read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses = cclasses[order(names(cclasses))])
 --8---cut here---end---8---


 Thanks,
 Andreas

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why must a named colClasses in read.table be in correct order

2015-07-08 Thread Andreas Leha
Hi Henrik,

Thank you very much for looking into this.  And thanks for the patch!

Yes, let's hope this is a typo that gets fixed.

Regards,
Andreas

Henrik Bengtsson henrik.bengts...@ucsf.edu writes:
 Thanks for insisting; I was wrong and I'm happy to see that there is
 indeed code intended for named 'colClasses', which even goes back to
 2004.   But as you report, then names only work when
 length(colClasses)  cols (which also explains why I though it was not
 supported).  I'm not sure if that _strictly less than_  test is
 intentional or a mistake, but I would propose the following patch:

 [HB-X201]{hb}: svn diff src\library\utils\R\readtable.R
 Index: src/library/utils/R/readtable.R
 ===
 --- src/library/utils/R/readtable.R (revision 68642)
 +++ src/library/utils/R/readtable.R (working copy)
 @@ -139,7 +139,7 @@
  if (rlabp) col.names - c(row.names, col.names)

  nmColClasses - names(colClasses)
 -if(length(colClasses)  cols)
 +if(length(colClasses) = cols)
  if(is.null(nmColClasses)) {
  colClasses - rep_len(colClasses, cols)
  } else {


 Your example works with this patch.  I've made it source():able so you
 can try it out (if you cannot source() https://, then download the
 file an source it locally):

 source(https://gist.githubusercontent.com/HenrikBengtsson/ed1eeb41a1b4d6c43b47/raw/ebe58f76e518dd014423bea466a5c93d2efd3c99/readtable-fix.R;)

 kkk - c(a\tb,
  3.14\tx)

 colClasses - c(a=numeric, b=character)
 data - read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses = colClasses)
 str(data)
 ### 'data.frame':   1 obs. of  2 variables:
 ### $ a: num 3.14
 ### $ b: chr x

 ## Does not work with utils::read.table(), but with patch
 data - read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses = rev(colClasses))
 str(data)
 ### 'data.frame':   1 obs. of  2 variables:
 ### $ a: num 3.14
 ### $ b: chr x

 Let's hope that the above is a (10-year old) typo, and changing a  to
 a = adds support for named 'colClasses', which is a really useful
 functionality.

 /Henrik

 On Wed, Jul 8, 2015 at 6:42 PM, Andreas Leha
 andreas.l...@med.uni-goettingen.de wrote:
 Hi Henrik,

 Thanks for your reply.

 I am not (yet) convinced, though.  The help page for read.table
 mentions named colClasses and if I specify colClasses for not all
 columns, the names are taken into account:

 --8---cut here---start-8---
 kkk - c(a\tb,
  3.14\tx)
 str(read.table(textConnection(kkk),
sep=\t,
header = TRUE))

 str(read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses=c(b=character)))
 --8---cut here---end---8---

 What am I missing?

 Best,
 Andreas



 On 09/07/2015 02:21, Henrik Bengtsson wrote:
 read.table() does not make use of names(colClasses) - only its values.
 Because of this, ordering is critical, as you noted. It shouldn't be
 too hard to add support for a named `colClasses` argument of
 utils::read.table(), but someone needs to convince the R core team
 that this is a good idea.

 As an alternative, see R.filesets::readDataFrame() for a
 read.table()-like function that matches names(colClasses) to column
 names, if they exists.

 /Henrik
 (author of R.filesets)

 On Wed, Jul 8, 2015 at 5:41 PM, Andreas Leha
 andreas.l...@med.uni-goettingen.de wrote:
 Hi all,

 Apparently, the colClasses argument to read.table needs to be in the
 order of the columns *even when it is named*.  Why is that?  And where
 would I find it in the documentation?

 Here is a MWE:

 --8---cut here---start-8---
 kkk - c(a\tb,
  3.14\tx)
 read.table(textConnection(kkk),
sep=\t,
header = TRUE)

 cclasses=c(b=character,
a=numeric)

 read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses = cclasses)  ## --- error

 read.table(textConnection(kkk),
sep=\t,
header = TRUE,
colClasses = cclasses[order(names(cclasses))])
 --8---cut here---end---8---


 Thanks,
 Andreas

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide 

[R] why must a named colClasses in read.table be in correct order

2015-07-08 Thread Andreas Leha
Hi all,

Apparently, the colClasses argument to read.table needs to be in the
order of the columns *even when it is named*.  Why is that?  And where
would I find it in the documentation?

Here is a MWE:

--8---cut here---start-8---
kkk - c(a\tb,
 3.14\tx)
read.table(textConnection(kkk),
   sep=\t,
   header = TRUE)

cclasses=c(b=character,
   a=numeric)

read.table(textConnection(kkk),
   sep=\t,
   header = TRUE,
   colClasses = cclasses)  ## --- error

read.table(textConnection(kkk),
   sep=\t,
   header = TRUE,
   colClasses = cclasses[order(names(cclasses))])
--8---cut here---end---8---


Thanks,
Andreas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.