I sent this to Matthew offlist but he wants it "on the record", so here is what I sent:
On Aug 31, 2010, at 11:56 AM, David Winsemius wrote:


On Aug 31, 2010, at 3:35 AM, Matthew Dowle wrote:


Nicolas,

Welcome to the list.

Where the documentation mentions 'quoted' it means the quote() function
to create an expression, not as in a character string.


Matthew;

I think you really should look at FAQ 1.5. It says nothing about "quoted". It does appear to imply that if someone had executed:

colname="x"

... that both DT[, colname, with=FALSE] and DT[, eval(colname)] should "work". Now you are saying that isn't so, that only the first will return anything like the expected result.

--
David

Alternatively you
can use [[ in the usual way since a data.table is a list.

colexp = quote(y)   # rather than "y"
a[,eval(colexp)]
[1] "2010-01-01 GMT" "2010-01-02 GMT" "2010-01-03 GMT" "2010-01-04 GMT" [5] "2010-01-05 GMT" "2010-01-06 GMT" "2010-01-07 GMT" "2010-01-08 GMT"
[9] "2010-01-09 GMT" "2010-01-10 GMT" "2010-01-11 GMT"

or

colname = "y"
a[[colname]]
[1] "2010-01-01 GMT" "2010-01-02 GMT" "2010-01-03 GMT" "2010-01-04 GMT" [5] "2010-01-05 GMT" "2010-01-06 GMT" "2010-01-07 GMT" "2010-01-08 GMT"
[9] "2010-01-09 GMT" "2010-01-10 GMT" "2010-01-11 GMT"


A single column name is a special case of expressions so although this
can create a steeper learner curve, it results in more power and
flexibility later.

Suggestions on how to improve documentation so that 'quoting' is clearer
are very welcome. I've added an item to the list so we don't forget.

Matthew


On Mon, 2010-08-30 at 23:59 -0400, Nicolas Chapados wrote:
Dear data.table friends and maintainers,


First, thanks to the authors for this excellent package: it really
fills a void in the R world. However, I have a question: I'm looking to have an efficient conversion of a data table object to a vector (of
the correct type) when querying a single column whose name is stored
in a variable.  As per the vignette and the FAQ, I use the syntax


  my.data.table[, colname, with=FALSE]


(where colname is a variable containing my desired column name) but
this returns another data table, not a vector.  Morever, the eval
syntax suggested in the FAQ simply does not work:


  my.data.table[, eval(colname)]


See example below.  I could use as.matrix on the result, but this
carries out undesirable type conversion in the case of columns
containing dates: see below.


Here is an example to reproduce this problem:


require(data.table)
Loading required package: data.table
a <- data.table(x=seq(1, 2, by=0.1), y=seq(as.POSIXct("2010-01-01"),
as.POSIXct("2010-01-11"), length.out=11))
a
      x          y
[1,] 1.0 2010-01-01
[2,] 1.1 2010-01-02
[3,] 1.2 2010-01-03
[4,] 1.3 2010-01-04
[5,] 1.4 2010-01-05
[6,] 1.5 2010-01-06
[7,] 1.6 2010-01-07
[8,] 1.7 2010-01-08
[9,] 1.8 2010-01-09
[10,] 1.9 2010-01-10
[11,] 2.0 2010-01-11
colname <- "y"


## The following returns a data table.  How can I get a vector, and
still preserve type information?
a[, colname, with=FALSE]
             y
[1,] 2010-01-01
[2,] 2010-01-02
[3,] 2010-01-03
[4,] 2010-01-04
[5,] 2010-01-05
[6,] 2010-01-06
[7,] 2010-01-07
[8,] 2010-01-08
[9,] 2010-01-09
[10,] 2010-01-10
[11,] 2010-01-11


## The eval recipe suggested in the FAQ does not work.
a[, eval(colname)]
[1] "y"


## as.vector does not convert away from data.table
as.vector(a[, colname, with=FALSE])
             y
[1,] 2010-01-01
[2,] 2010-01-02
[3,] 2010-01-03
[4,] 2010-01-04
[5,] 2010-01-05
[6,] 2010-01-06
[7,] 2010-01-07
[8,] 2010-01-08
[9,] 2010-01-09
[10,] 2010-01-10
[11,] 2010-01-11
class(as.vector(a[, colname, with=FALSE]))
[1] "data.table"


## as.matrix loses type information (NOTE: in my case it is not
acceptable to
## convert this character vector back to a POSIXct, due to the loss of
important
## timezone information. Furthermore, this would be very inefficient.)
as.matrix(a[, colname, with=FALSE])
    y
[1,] "2010-01-01"
[2,] "2010-01-02"
[3,] "2010-01-03"
[4,] "2010-01-04"
[5,] "2010-01-05"
[6,] "2010-01-06"
[7,] "2010-01-07"
[8,] "2010-01-08"
[9,] "2010-01-09"
[10,] "2010-01-10"
[11,] "2010-01-11"
mode(as.matrix(a[, colname, with=FALSE]))
[1] "character"


## Finally, one could go through a data.frame, but this is inefficient
## and it sorts of defeats the purpose of using data.table...
as.data.frame(a[, colname, with=FALSE])[, colname]
[1] "2010-01-01 EST" "2010-01-02 EST" "2010-01-03 EST" "2010-01-04
EST"
[5] "2010-01-05 EST" "2010-01-06 EST" "2010-01-07 EST" "2010-01-08
EST"
[9] "2010-01-09 EST" "2010-01-10 EST" "2010-01-11 EST"




So at this point, my imagination is running out and I'm turning to
this list for suggestions. This should seem to be a fairly frequent
use-case, and I'm surprised it does not appear to have previously been
addressed.


For the record, here is my sessionInfo()


sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-pc-linux-gnu


locale:
C


attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base



other attached packages:
[1] data.table_1.4.1




Thanks in advance for any help!
+ Nicolas Chapados
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Reply via email to