[datatable-help] How to speed up grouping time series, help please

Daniele Amberti Tue, 05 Apr 2011 00:35:33 -0700

I retrieve for a few hundred times a group of time series (10-15 ts with 10000 
values each), on every group I do some calculation, graphs etc. I wonder if 
there is a faster method than what presented below to get an appropriate 
timeseries object.


Making a query with RODBC for every group I get a data frame like this:

> X
  ID                DATE     VALUE
14  3 2000-01-01 00:00:03 0.5726334
4   1 2000-01-01 00:00:03 0.8830174
1   1 2000-01-01 00:00:00 0.2875775
15  3 2000-01-01 00:00:04 0.1029247
11  3 2000-01-01 00:00:00 0.9568333
9   2 2000-01-01 00:00:03 0.5514350
7   2 2000-01-01 00:00:01 0.5281055
6   2 2000-01-01 00:00:00 0.0455565
12  3 2000-01-01 00:00:01 0.4533342
8   2 2000-01-01 00:00:02 0.8924190
3   1 2000-01-01 00:00:02 0.4089769
13  3 2000-01-01 00:00:02 0.6775706

And I want to get a timeSeries object or xts object like this:

                           1         2         3
2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
2000-01-01 00:00:01        NA 0.5281055 0.4533342
2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
2000-01-01 00:00:04        NA        NA 0.1029247

Both classes accept a matrix so if I can create a matrix like the one 
represented above and an array of characters representing dates faster than 
what possible with xts:::merge, for example, I will have a faster 
implementation, this is the reason why I'm writing to datatable-help; I red 
vignettes, tests and did tests trying to generate a set of data.table (using 
.SD and by = ID) an then CJ but without success up to now, any input to test 
this approach will be really appreciate.

Input data can be sorted or unsorted (the most complicated case is in the 
example, unsorted and missing data) in the sense that I can  sort in query if I 
can take an advantage from this.

Below some code to generate the test case above.

Thanks in advance for any input, best regards,
Daniele


set.seed(123)
N <- 100 # number of observations, use 5 to replicate test case above
K <- 3   # number of timeseries ID

X <- data.frame(
 ID = rep(1:K, each = N),
 DATE = as.character(rep(as.POSIXct("2000-01-01", tz = "GMT")+ 0:(N-1), K)),
 VALUE = runif(N*K), stringsAsFactors = FALSE)

X <- X[sample(1:(N*K), N*K),] # sample observations to get random order 
(optional)
X <- X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),] # 20% missing

head(X, 15)


# an implementation in xts:
xtsSplit <- function(x)
{
 library(xts)
 x <- xts(x[,c("ID","VALUE")], as.POSIXct(x[,"DATE"]))
 x <- do.call(merge, split(x$VALUE,x$ID))
 return(x)
}

xtsSplitTime <- replicate(50,
 system.time(xtsSplit(X))[[1]])
median(xtsTime)


ORS Srl

Via Agostino Morando 1/3 12060 Roddi (Cn) - Italy
Tel. +39 0173 620211
Fax. +39 0173 620299 / +39 0173 433111
Web Site www.ors.it

------------------------------------------------------------------------------------------------------------------------
Qualsiasi utilizzo non autorizzato del presente messaggio e dei suoi allegati è 
vietato e potrebbe costituire reato.
Se lei avesse ricevuto erroneamente questo messaggio, Le saremmo grati se 
provvedesse alla distruzione dello stesso
e degli eventuali allegati.
Opinioni, conclusioni o altre informazioni riportate nella e-mail, che non 
siano relative alle attività e/o
alla missione aziendale di O.R.S. Srl si intendono non  attribuibili alla 
società stessa, né la impegnano in alcun modo.
_______________________________________________
datatable-help mailing list
[email protected]
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help

[datatable-help] How to speed up grouping time series, help please

Reply via email to