Re: [datatable-help] Using list valued columns with by (Matthew Dowle)

Matthew Dowle Wed, 22 Feb 2012 03:40:23 -0800

and if the lm() only uses a few columns of d, set .SDcols to those columns
and that'll speed it up (possibly by a lot).


>
> Hi,
>
> That's passing 'b' to f() which happens to be the name of the whole data
> set (there isn't a column 'b').  Try d[,f(.SD), by=x].
>
> Thanks for the kind words. If you haven't already done so, please do vote
> for the package on Crantastic (the 'I use it' button, and, also, the vote
> button). That may help others when they consider data.table for the first
> time. The more users, the more feedback and the more edge cases we catch,
> hopefully. Same goes for other packages.
>
>     http://crantastic.org/packages/data-table
>
> Matthew
>
>> I wanted to follow up on this as I am trying to do something similar
>> to what Chris asked about.  but first, let me say thanks for the work
>> on this.  I have several different situations where a call to ddply
>> takes about 10 minutes but only ~1second with data.table.  So I'm very
>> thankful for the package, but I'm still very much a novice with it.
>> Here's the present problem.
>>
>> here's the toy data again:
>> d<-  data.table(x=rep(1:2,each=10), y=rnorm(20), key="x")
>>
>>> dim(d)
>> [1] 20  2
>>
>> I would like to generate a column of fitted values from lm that I'll
>> later cbind to the original data.
>>
>> f<- function(d) list(pred = fitted(lm(y ~ x,d)))
>>
>> p<- d[,f(d), by = x]
>>
>>> dim(p)
>> [1] 40  2
>>
>> for reasons I don't understand, this generates 2 sets of (correct)
>> "pred" values, but the "x" values are wrong.  Why does this generate
>> two duplicate sets?  I should say that the real data has ~2 million
>> rows and the call will be something closer to: p<- d[,f(d), by =
>> list(X1, X2, X3, X4)].
>>
>> Matthew
>>
>>
>>
>>
>>
>>
>>> or functional form :
>>>
>>> f <- function(y) list(a=mean(y), b=list(rep(y[1],3)) )
>>> data[, f(y), by=x]
>>>     x           a                                  b
>>> [1,] 1 -0.07760762 -0.1715334, -0.1715334, -0.1715334
>>> [2,] 2 0.36923570          1.01892, 1.01892, 1.01892
>>>
>>
>>
>>
>>
>>
>>>> data <- data.table(x=rep(1:2,each=10), y=rnorm(20), key="x")
>>>>
>>>> f <- function(y) {
>>>>   return( list(a=mean(y), b=rep(y[1],10) )
>>>> }
>>>>
>>>> result <- data[, list(f(y)), by=x]
>>>>
>>>>
>>>> What winds up happening is that result winds up having V1 alternate
>>>> between f(y)$a and f(y)$b, resulting in 4 rows, 2 for each value of x.
>>>> What I want instead is result to have 2 rows,  with V1 being the list
>>>> that gets returned from f(y).
>>>>
>>>> I have found that this works:
>>>>
>>>> result <- data[, list(list(f(y))), by=x]
>>>>
>>>> But then I have to do:
>>>>
>>>> result[J(1),][,V1][[1]]
>>>>
>>>> to get the same thing I would get from f(result[J(1),][,V1]).  I want
>>>> to lose the [[1]] but I can't seem to see how I would do so.  Really
>>>> what I would envision is like with sapply, I want to do
>>>>
>>>>
>>>> result <- data[, f(y), by=x, simplify=FALSE]
>>>>
>>>> But of course simplify isn't an argument for data.table. Thoughts?
>>>>
>>>> -Chris
>>>> _______________________________________________
>>>> datatable-help mailing list
>>>> [email protected]
>>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 2
>>> Date: Tue, 21 Feb 2012 17:52:40 -0500
>>> From: Steve Lianoglou <[email protected]>
>>> To: [email protected]
>>> Cc: [email protected], Prasad Chalasani
>>>        <[email protected]>
>>> Subject: Re: [datatable-help] BUG: droplevels mangles subsetted
>>>        data.table
>>> Message-ID:
>>>      
>>>  <caha9mcnblauink9fjr2jnow10r8vfhyrfvu0upea4qjwfde...@mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> Hi,
>>>
>>> I guess I'm missing something, but ... why isn't your proposed
>>> droplevels.data.table consistent with base? Because the ordering of
>>> the rows might change (maybe(?))?
>>>
>>> -steve
>>>
>>> On Tue, Feb 21, 2012 at 4:42 PM, Matthew Dowle <[email protected]>
>>> wrote:
>>>>
>>>> Yes, could do. Building on that here's a quick stab at
>>>> droplevels.data.table. This does it by reference, or it could take a
>>>> copy(). If it takes a copy() it would be consistent with base
>>>> (probably
>>>> required), but then how best to make a non-copying version available?
>>>>
>>>> droplevels.data.table = function(dt) {
>>>> ? ?oldkey = key( dt )
>>>> ? ?for (i in names(dt)) {
>>>> ? ? ? ?if (is.factor(dt[[i]])) dt[,i:=droplevels(dt[[i]]),with=FALSE]
>>>> ? ?}
>>>> ? ?setkeyv( dt, oldkey )
>>>> ? ?dt
>>>> }
>>>>
>>>> On Tue, 2012-02-21 at 15:38 -0500, Prasad Chalasani wrote:
>>>>> Meanwhile as a work-around, I suppose one should do:
>>>>>
>>>>> keys <- key( dt ) # this could in general be a large set of keys
>>>>> sub_d <- droplevels( as.data.frame( dt[ name != 'a' ] ) )
>>>>> sub_dt <- data.table( sub_d )
>>>>> setkeyv( sub_dt, keys )
>>>>>
>>>>>
>>>>>
>>>>> On Feb 21, 2012, at 1:59 PM, Matthew Dowle wrote:
>>>>>
>>>>> >
>>>>> > I see the problem too but (just) adding droplevels.data.table might
>>>>> miss
>>>>> > the root cause.
>>>>> >
>>>>> >> because the way the
>>>>> >> droplevels.data.frame method works isn't compatible with
>>>>> data.table
>>>>> >> indexing.
>>>>> >
>>>>> > But it's intended to be. I can see the switch at the top of
>>>>> [.data.table
>>>>> > is detecting the caller isn't data.table aware, and it is then
>>>>> dispatching
>>>>> > to `[.data.frame` but why it then isn't working I'm not sure.
>>>>> Something to
>>>>> > do with the missing j or missing drop not being passed through
>>>>> correctly,
>>>>> > perhaps.
>>>>> >
>>>>> > I have heard it said (once or twice) that data.table is "almost"
>>>>> > compatible with non-data.table-aware packages, but never had an
>>>>> example
>>>>> > before. I wonder if this is it!
>>>>> >
>>>>> > A (fast) droplevels.data.table using := would be good anyway,
>>>>> though.
>>>>> >
>>>>> > Matthew
>>>>> >
>>>>> >
>>>>> >
>>>>> >> Hi,
>>>>> >>
>>>>> >> I see what the problem is -- we need to provide a
>>>>> >> droplevels.data.table S3 method, because the way the
>>>>> >> droplevels.data.frame method works isn't compatible with
>>>>> data.table
>>>>> >> indexing.
>>>>> >>
>>>>> >> Will fix:
>>>>> >>
>>>>> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1841&group_id=240&atid=975
>>>>> >>
>>>>> >> Thanks for raising the flag.
>>>>> >>
>>>>> >> Cheers,
>>>>> >> -steve
>>>>> >>
>>>>> >> On Tue, Feb 21, 2012 at 12:38 PM, pchalasani
>>>>> <[email protected]>
>>>>> wrote:
>>>>> >>> ?Surprising that this wasn't noticed before, or perhaps I'm not
>>>>> >>> following
>>>>> >>> some recommended idiom to drop levels when using ?data.table. The
>>>>> >>> following
>>>>> >>> code illustrates the bug clearly: The bug remains regardless of
>>>>> whether
>>>>> >>> I
>>>>> >>> use "subset" or simply use dt1 = dt[ name != 'a' ].
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> ? ?d <- data.table(name = c('a','b','c'), value = 1:3)
>>>>> >>> ? ?dt <- data.table(d)
>>>>> >>> ? ?setkey(dt,'name')
>>>>> >>> ? ?dt1 <- subset(dt,name != 'a') ?# or dt1 <- dt[ name != 'a' ]
>>>>> >>> ? ?> dt1
>>>>> >>> ? ? ? ? ?name value
>>>>> >>> ? ? [1,] ? ?b ? ? 2
>>>>> >>> ? ? [2,] ? ?c ? ? 3
>>>>> >>>
>>>>> >>> ? ?> droplevels(dt1)
>>>>> >>> ? ? ? ? ?name value
>>>>> >>> ? ? [1,] ? ?b ? ? 1
>>>>> >>> ? ? [2,] ? ?c ? ? 3
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> View this message in context:
>>>>> >>> http://r.789695.n4.nabble.com/BUG-droplevels-mangles-subsetted-data-table-tp4407694p4407694.html
>>>>> >>> Sent from the datatable-help mailing list archive at Nabble.com.
>>>>> >>> _______________________________________________
>>>>> >>> datatable-help mailing list
>>>>> >>> [email protected]
>>>>> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Steve Lianoglou
>>>>> >> Graduate Student: Computational Systems Biology
>>>>> >> ?| Memorial Sloan-Kettering Cancer Center
>>>>> >> ?| Weill Medical College of Cornell University
>>>>> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>> >> _______________________________________________
>>>>> >> datatable-help mailing list
>>>>> >> [email protected]
>>>>> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>> >>
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>> ?| Memorial Sloan-Kettering Cancer Center
>>> ?| Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 3
>>> Date: Tue, 21 Feb 2012 23:22:59 +0000
>>> From: Matthew Dowle <[email protected]>
>>> To: Steve Lianoglou <[email protected]>
>>> Cc: [email protected]
>>> Subject: Re: [datatable-help] BUG: droplevels mangles subsetted
>>>        data.table
>>> Message-ID: <1329866579.2108.208.camel@netbook>
>>> Content-Type: text/plain; charset="UTF-8"
>>>
>>> Hi. Just because as it stands it doesn't copy, so
>>>
>>>    newDT = dropfactors(DT)
>>>
>>> would change DT by reference with newDT a new pointer to that same
>>> modified object, whereas base would leave DT unchanged with newDT a
>>> modified copy.
>>>
>>> Just adding dt=copy(dt) at the start of the function would make it
>>> consistent,  but then how would we (data.table-aware code) call the
>>> non-copying version if we wanted that (which is likely needed, given
>>> the
>>> motivation of dropping unused levels I guess). Could continue the set*
>>> theme and create setdropfactors()? but that doesn't roll off the
>>> tongue.
>>> Or the copy() could be switched in the usual way :
>>>
>>>     if (!cedta) dt = copy(dt)
>>>
>>> and then we data.table users would just know that droplevels worked by
>>> reference and we should copy() first if we want a copy, in the usual
>>> way. Whilst not upsetting non-data.table-aware packages, since they
>>> would still copy. Think I prefer the switched copy, carefully
>>> documented, which would save yet another new function. I'm thinking
>>> that
>>> users' expectations of dropfactors() would probably be that it worked
>>> by
>>> reference on data.tables anyway (or if not, would want it to after the
>>> initial surprise).
>>>
>>> Matthew
>>>
>>> On Tue, 2012-02-21 at 17:52 -0500, Steve Lianoglou wrote:
>>>> Hi,
>>>>
>>>> I guess I'm missing something, but ... why isn't your proposed
>>>> droplevels.data.table consistent with base? Because the ordering of
>>>> the rows might change (maybe(?))?
>>>>
>>>> -steve
>>>>
>>>> On Tue, Feb 21, 2012 at 4:42 PM, Matthew Dowle
>>>> <[email protected]>
>>>> wrote:
>>>> >
>>>> > Yes, could do. Building on that here's a quick stab at
>>>> > droplevels.data.table. This does it by reference, or it could take a
>>>> > copy(). If it takes a copy() it would be consistent with base
>>>> (probably
>>>> > required), but then how best to make a non-copying version
>>>> available?
>>>> >
>>>> > droplevels.data.table = function(dt) {
>>>> >    oldkey = key( dt )
>>>> >    for (i in names(dt)) {
>>>> >        if (is.factor(dt[[i]]))
>>>> dt[,i:=droplevels(dt[[i]]),with=FALSE]
>>>> >    }
>>>> >    setkeyv( dt, oldkey )
>>>> >    dt
>>>> > }
>>>> >
>>>> > On Tue, 2012-02-21 at 15:38 -0500, Prasad Chalasani wrote:
>>>> >> Meanwhile as a work-around, I suppose one should do:
>>>> >>
>>>> >> keys <- key( dt ) # this could in general be a large set of keys
>>>> >> sub_d <- droplevels( as.data.frame( dt[ name != 'a' ] ) )
>>>> >> sub_dt <- data.table( sub_d )
>>>> >> setkeyv( sub_dt, keys )
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Feb 21, 2012, at 1:59 PM, Matthew Dowle wrote:
>>>> >>
>>>> >> >
>>>> >> > I see the problem too but (just) adding droplevels.data.table
>>>> might miss
>>>> >> > the root cause.
>>>> >> >
>>>> >> >> because the way the
>>>> >> >> droplevels.data.frame method works isn't compatible with
>>>> data.table
>>>> >> >> indexing.
>>>> >> >
>>>> >> > But it's intended to be. I can see the switch at the top of
>>>> [.data.table
>>>> >> > is detecting the caller isn't data.table aware, and it is then
>>>> dispatching
>>>> >> > to `[.data.frame` but why it then isn't working I'm not sure.
>>>> Something to
>>>> >> > do with the missing j or missing drop not being passed through
>>>> correctly,
>>>> >> > perhaps.
>>>> >> >
>>>> >> > I have heard it said (once or twice) that data.table is "almost"
>>>> >> > compatible with non-data.table-aware packages, but never had an
>>>> example
>>>> >> > before. I wonder if this is it!
>>>> >> >
>>>> >> > A (fast) droplevels.data.table using := would be good anyway,
>>>> though.
>>>> >> >
>>>> >> > Matthew
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >> Hi,
>>>> >> >>
>>>> >> >> I see what the problem is -- we need to provide a
>>>> >> >> droplevels.data.table S3 method, because the way the
>>>> >> >> droplevels.data.frame method works isn't compatible with
>>>> data.table
>>>> >> >> indexing.
>>>> >> >>
>>>> >> >> Will fix:
>>>> >> >>
>>>> >> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1841&group_id=240&atid=975
>>>> >> >>
>>>> >> >> Thanks for raising the flag.
>>>> >> >>
>>>> >> >> Cheers,
>>>> >> >> -steve
>>>> >> >>
>>>> >> >> On Tue, Feb 21, 2012 at 12:38 PM, pchalasani
>>>> <[email protected]> wrote:
>>>> >> >>>  Surprising that this wasn't noticed before, or perhaps I'm not
>>>> >> >>> following
>>>> >> >>> some recommended idiom to drop levels when using  data.table.
>>>> The
>>>> >> >>> following
>>>> >> >>> code illustrates the bug clearly: The bug remains regardless of
>>>> whether
>>>> >> >>> I
>>>> >> >>> use "subset" or simply use dt1 = dt[ name != 'a' ].
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>    d <- data.table(name = c('a','b','c'), value = 1:3)
>>>> >> >>>    dt <- data.table(d)
>>>> >> >>>    setkey(dt,'name')
>>>> >> >>>    dt1 <- subset(dt,name != 'a')  # or dt1 <- dt[ name != 'a' ]
>>>> >> >>>    > dt1
>>>> >> >>>          name value
>>>> >> >>>     [1,]    b     2
>>>> >> >>>     [2,]    c     3
>>>> >> >>>
>>>> >> >>>    > droplevels(dt1)
>>>> >> >>>          name value
>>>> >> >>>     [1,]    b     1
>>>> >> >>>     [2,]    c     3
>>>> >> >>>
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> View this message in context:
>>>> >> >>> http://r.789695.n4.nabble.com/BUG-droplevels-mangles-subsetted-data-table-tp4407694p4407694.html
>>>> >> >>> Sent from the datatable-help mailing list archive at
>>>> Nabble.com.
>>>> >> >>> _______________________________________________
>>>> >> >>> datatable-help mailing list
>>>> >> >>> [email protected]
>>>> >> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Steve Lianoglou
>>>> >> >> Graduate Student: Computational Systems Biology
>>>> >> >>  | Memorial Sloan-Kettering Cancer Center
>>>> >> >>  | Weill Medical College of Cornell University
>>>> >> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>> >> >> _______________________________________________
>>>> >> >> datatable-help mailing list
>>>> >> >> [email protected]
>>>> >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>> >> >>
>>>> >> >
>>>> >> >
>>>> >>
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Message: 4
>>> Date: Tue, 21 Feb 2012 21:24:33 -0500
>>> From: Steve Lianoglou <[email protected]>
>>> To: [email protected]
>>> Cc: [email protected]
>>> Subject: Re: [datatable-help] BUG: droplevels mangles subsetted
>>>        data.table
>>> Message-ID:
>>>      
>>>  <CAHA9McNzNWNS+=4pXwLwfj5GvnpUerJx9otUOV4pY1fEXfk=r...@mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> Ahh, right ... the copying. Good point.
>>>
>>> Regarding the logic you suggest as to when to copy or not, how do you
>>> feel about going the explicit route instead of trying to take a best
>>> guess when we should/shouldn't copy via `cedta` and doing the
>>> 'data.frame behavior' by default.
>>>
>>> By that I mean: since the droplevels function has a `...` param, can
>>> we do something like:
>>>
>>> droplevels.data.table <- function(x, except=NULL, do.copy=TRUE, ...) {
>>>  if (do.copy) {
>>>    x <- copy(x)
>>>  }
>>>  oldkey = key(x)
>>>  change.me <- names(x)
>>>  if (!is.null(except)) {
>>>    change.me <- setdiff(change.me, names(x)[except])
>>>  }
>>>  for (i in change.me)) {
>>>       if (is.factor(x[[i]])) x[,i:=droplevels(x[[i]]),with=FALSE]
>>>   }
>>>  setkeyv( x, oldkey )
>>> }
>>>
>>> yay/nay?
>>>
>>> -steve
>>>
>>> On Tue, Feb 21, 2012 at 6:22 PM, Matthew Dowle <[email protected]>
>>> wrote:
>>>> Hi. Just because as it stands it doesn't copy, so
>>>>
>>>> ? ?newDT = dropfactors(DT)
>>>>
>>>> would change DT by reference with newDT a new pointer to that same
>>>> modified object, whereas base would leave DT unchanged with newDT a
>>>> modified copy.
>>>>
>>>> Just adding dt=copy(dt) at the start of the function would make it
>>>> consistent, ?but then how would we (data.table-aware code) call the
>>>> non-copying version if we wanted that (which is likely needed, given
>>>> the
>>>> motivation of dropping unused levels I guess). Could continue the set*
>>>> theme and create setdropfactors()? but that doesn't roll off the
>>>> tongue.
>>>> Or the copy() could be switched in the usual way :
>>>>
>>>> ? ? if (!cedta) dt = copy(dt)
>>>>
>>>> and then we data.table users would just know that droplevels worked by
>>>> reference and we should copy() first if we want a copy, in the usual
>>>> way. Whilst not upsetting non-data.table-aware packages, since they
>>>> would still copy. Think I prefer the switched copy, carefully
>>>> documented, which would save yet another new function. I'm thinking
>>>> that
>>>> users' expectations of dropfactors() would probably be that it worked
>>>> by
>>>> reference on data.tables anyway (or if not, would want it to after the
>>>> initial surprise).
>>>>
>>>> Matthew
>>>>
>>>> On Tue, 2012-02-21 at 17:52 -0500, Steve Lianoglou wrote:
>>>>> Hi,
>>>>>
>>>>> I guess I'm missing something, but ... why isn't your proposed
>>>>> droplevels.data.table consistent with base? Because the ordering of
>>>>> the rows might change (maybe(?))?
>>>>>
>>>>> -steve
>>>>>
>>>>> On Tue, Feb 21, 2012 at 4:42 PM, Matthew Dowle
>>>>> <[email protected]> wrote:
>>>>> >
>>>>> > Yes, could do. Building on that here's a quick stab at
>>>>> > droplevels.data.table. This does it by reference, or it could take
>>>>> a
>>>>> > copy(). If it takes a copy() it would be consistent with base
>>>>> (probably
>>>>> > required), but then how best to make a non-copying version
>>>>> available?
>>>>> >
>>>>> > droplevels.data.table = function(dt) {
>>>>> > ? ?oldkey = key( dt )
>>>>> > ? ?for (i in names(dt)) {
>>>>> > ? ? ? ?if (is.factor(dt[[i]]))
>>>>> dt[,i:=droplevels(dt[[i]]),with=FALSE]
>>>>> > ? ?}
>>>>> > ? ?setkeyv( dt, oldkey )
>>>>> > ? ?dt
>>>>> > }
>>>>> >
>>>>> > On Tue, 2012-02-21 at 15:38 -0500, Prasad Chalasani wrote:
>>>>> >> Meanwhile as a work-around, I suppose one should do:
>>>>> >>
>>>>> >> keys <- key( dt ) # this could in general be a large set of keys
>>>>> >> sub_d <- droplevels( as.data.frame( dt[ name != 'a' ] ) )
>>>>> >> sub_dt <- data.table( sub_d )
>>>>> >> setkeyv( sub_dt, keys )
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Feb 21, 2012, at 1:59 PM, Matthew Dowle wrote:
>>>>> >>
>>>>> >> >
>>>>> >> > I see the problem too but (just) adding droplevels.data.table
>>>>> might miss
>>>>> >> > the root cause.
>>>>> >> >
>>>>> >> >> because the way the
>>>>> >> >> droplevels.data.frame method works isn't compatible with
>>>>> data.table
>>>>> >> >> indexing.
>>>>> >> >
>>>>> >> > But it's intended to be. I can see the switch at the top of
>>>>> [.data.table
>>>>> >> > is detecting the caller isn't data.table aware, and it is then
>>>>> dispatching
>>>>> >> > to `[.data.frame` but why it then isn't working I'm not sure.
>>>>> Something to
>>>>> >> > do with the missing j or missing drop not being passed through
>>>>> correctly,
>>>>> >> > perhaps.
>>>>> >> >
>>>>> >> > I have heard it said (once or twice) that data.table is "almost"
>>>>> >> > compatible with non-data.table-aware packages, but never had an
>>>>> example
>>>>> >> > before. I wonder if this is it!
>>>>> >> >
>>>>> >> > A (fast) droplevels.data.table using := would be good anyway,
>>>>> though.
>>>>> >> >
>>>>> >> > Matthew
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >> Hi,
>>>>> >> >>
>>>>> >> >> I see what the problem is -- we need to provide a
>>>>> >> >> droplevels.data.table S3 method, because the way the
>>>>> >> >> droplevels.data.frame method works isn't compatible with
>>>>> data.table
>>>>> >> >> indexing.
>>>>> >> >>
>>>>> >> >> Will fix:
>>>>> >> >>
>>>>> >> >> https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1841&group_id=240&atid=975
>>>>> >> >>
>>>>> >> >> Thanks for raising the flag.
>>>>> >> >>
>>>>> >> >> Cheers,
>>>>> >> >> -steve
>>>>> >> >>
>>>>> >> >> On Tue, Feb 21, 2012 at 12:38 PM, pchalasani
>>>>> <[email protected]> wrote:
>>>>> >> >>> ?Surprising that this wasn't noticed before, or perhaps I'm
>>>>> not
>>>>> >> >>> following
>>>>> >> >>> some recommended idiom to drop levels when using ?data.table.
>>>>> The
>>>>> >> >>> following
>>>>> >> >>> code illustrates the bug clearly: The bug remains regardless
>>>>> of
>>>>> whether
>>>>> >> >>> I
>>>>> >> >>> use "subset" or simply use dt1 = dt[ name != 'a' ].
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> ? ?d <- data.table(name = c('a','b','c'), value = 1:3)
>>>>> >> >>> ? ?dt <- data.table(d)
>>>>> >> >>> ? ?setkey(dt,'name')
>>>>> >> >>> ? ?dt1 <- subset(dt,name != 'a') ?# or dt1 <- dt[ name != 'a'
>>>>> ]
>>>>> >> >>> ? ?> dt1
>>>>> >> >>> ? ? ? ? ?name value
>>>>> >> >>> ? ? [1,] ? ?b ? ? 2
>>>>> >> >>> ? ? [2,] ? ?c ? ? 3
>>>>> >> >>>
>>>>> >> >>> ? ?> droplevels(dt1)
>>>>> >> >>> ? ? ? ? ?name value
>>>>> >> >>> ? ? [1,] ? ?b ? ? 1
>>>>> >> >>> ? ? [2,] ? ?c ? ? 3
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> View this message in context:
>>>>> >> >>> http://r.789695.n4.nabble.com/BUG-droplevels-mangles-subsetted-data-table-tp4407694p4407694.html
>>>>> >> >>> Sent from the datatable-help mailing list archive at
>>>>> Nabble.com.
>>>>> >> >>> _______________________________________________
>>>>> >> >>> datatable-help mailing list
>>>>> >> >>> [email protected]
>>>>> >> >>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Steve Lianoglou
>>>>> >> >> Graduate Student: Computational Systems Biology
>>>>> >> >> ?| Memorial Sloan-Kettering Cancer Center
>>>>> >> >> ?| Weill Medical College of Cornell University
>>>>> >> >> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>> >> >> _______________________________________________
>>>>> >> >> datatable-help mailing list
>>>>> >> >> [email protected]
>>>>> >> >> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>>> >> >>
>>>>> >> >
>>>>> >> >
>>>>> >>
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>> ?| Memorial Sloan-Kettering Cancer Center
>>> ?| Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> datatable-help mailing list
>>> [email protected]
>>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>>
>>> End of datatable-help Digest, Vol 24, Issue 9
>>> *********************************************
>> _______________________________________________
>> datatable-help mailing list
>> [email protected]
>> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>>
>
>
> _______________________________________________
> datatable-help mailing list
> [email protected]
> https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
>


_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] Using list valued columns with by (Matthew Dowle)

Reply via email to