Re: [datatable-help] join results aren't always sorted?

Matthew Dowle Fri, 31 May 2013 02:00:43 -0700

Hi,

why is clicks but not shares sorted by time?

the groups (each unique time in this case) are returned in the order offirst appearance, when you use 'by'. This is important and relied upon.When this result is known, a quick check is made to see if this isordered (using the very fast is.unsorted()) and if so the result is thenmarked as keyed (which is the "sorted" attribute seen in the str()output).

What I actually want is a single data table keyed by time with ...


How about 'keyby' rather than 'by' :

    dates.dt[s.c=="SHARE", list(sum(count)), keyby="time"]

Even if I know the data is already sorted in group order, I often use'keyby' anyway for robustness.


Matthew


On 30.05.2013 20:48, Sam Steingold wrote:

Hi,
I have a table:
--8<---------------cut here---------------start------------->8---

str(dates.dt)

Classes ‘data.table’ and 'data.frame':  1343 obs. of  4 variables:
 $ sid  : chr  "missing" "missing" "missing" "missing" ...
 $ s.c  : chr  "CLICK" "CLICK" "CLICK" "CLICK" ...
 $ count: int  70559 71555 79985 84385 88147 94130 100195 109031
116890 129726 ...

$ time : POSIXct, format: "2013-05-15 00:00:00" "2013-05-1501:00:00" ...

 - attr(*, ".internal.selfref")=<externalptr>
 - attr(*, "sorted")= chr  "sid" "s.c" "time"

dates.dt

          sid   s.c count                time
   1: missing CLICK 70559 2013-05-15 00:00:00
   2: missing CLICK 71555 2013-05-15 01:00:00
   3: missing CLICK 79985 2013-05-15 02:00:00
   4: missing CLICK 84385 2013-05-15 03:00:00
   5: missing CLICK 88147 2013-05-15 04:00:00
  ---
1339: present SHARE 35295 2013-05-28 19:00:00
1340: present SHARE 36284 2013-05-28 20:00:00
1341: present SHARE 69504 2013-05-28 21:00:00
1342: present SHARE 67037 2013-05-28 22:00:00
1343: present SHARE 61014 2013-05-28 23:00:00
--8<---------------cut here---------------end--------------->8---
I summarise them by various fields:
--8<---------------cut here---------------start------------->8---

shares <- dates.dt[s.c=="SHARE", list(sum(count)) , by="time"]
clicks <- dates.dt[s.c=="CLICK", list(sum(count)) , by="time"]
str(shares)

Classes ‘data.table’ and 'data.frame':  336 obs. of  2 variables:

$ time: POSIXct, format: "2013-05-15 00:00:00" "2013-05-15 01:00:00"...

 $ V1  : int  60531 57837 67495 76716 83465 86822 91318 100520 112352
124784 ...
 - attr(*, ".internal.selfref")=<externalptr>

str(clicks)

Classes ‘data.table’ and 'data.frame':  336 obs. of  2 variables:

$ time: POSIXct, format: "2013-05-15 00:00:00" "2013-05-15 01:00:00"...

 $ V1  : int  129450 137222 157721 171319 183720 195652 216003 238295
260715 279235 ...
 - attr(*, "sorted")= chr "time"
 - attr(*, ".internal.selfref")=<externalptr>
--8<---------------cut here---------------end--------------->8---
why is clicks but not shares sorted by time?

(if I make "time" the first key in dates.dt, the problem goes away,so,

I guess, this is expected).

What I actually want is a single data table keyed by time withcolumns

shares,clicks,missing,present,missing/clicks &c
I can, obviously, construct it by hand:
--8<---------------cut here---------------start------------->8---
setkeyv(shares,"time")
stopifnot(identical(shares$time,clicks$time))

dt <- data.table(time=shares$time, clicks=clicks$V1,shares=shares$V1)

--8<---------------cut here---------------end--------------->8---
but I was wondering if there is a better way.
Thanks.

_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

Re: [datatable-help] join results aren't always sorted?

Reply via email to