Re: [R] Quelplot

2011-09-22 Thread Den Alpin
Hi Hadley,
have a look at:

http://www.riani.it/pub/zrc-csda98.pdf
and some Gauss code:
http://www.riani.it/Gauss/procedures/BOXPLOTB.G

Best regards,
Daniele


2011/9/21 Hadley Wickham had...@rice.edu:
 Hi all,

 Does anyone have an R implementation of the queplot (K. M. Goldberg
 and B. Iglewicz. Bivariate extensions of the boxplot. Technometrics,
 34(3):pp. 307–320, 1992)?  I'm struggling with the estimation of the
 asymmetry parameters.

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to speed up grouping time series, help please

2011-04-07 Thread Den Alpin
I found a faster implementation (by an order of magnitude from my
tests) than the one using xts, split, merge (from Joshua).
I report the two fastest solution below with code to generate a test
case; some work still to be done for columns order and naming,
Test case has grown from my previous post to get a more realistic timing.

Any comment or idea to further speed up multivariate time series
creation with classes xts or timeSeries starting from a data.frame
like the one reported here is welcome.

Best regards,
Den


a data.frame example (code below to generate it)

  IDDATE VALUE
14  3 2000-01-01 00:00:03 0.5726334
4   1 2000-01-01 00:00:03 0.8830174
1   1 2000-01-01 00:00:00 0.2875775
15  3 2000-01-01 00:00:04 0.1029247
11  3 2000-01-01 00:00:00 0.9568333
9   2 2000-01-01 00:00:03 0.5514350
7   2 2000-01-01 00:00:01 0.5281055
6   2 2000-01-01 00:00:00 0.0455565
12  3 2000-01-01 00:00:01 0.4533342
8   2 2000-01-01 00:00:02 0.8924190
3   1 2000-01-01 00:00:02 0.4089769
13  3 2000-01-01 00:00:02 0.6775706

And I want to get a timeSeries object or xts object like this:

   1 2 3
2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
2000-01-01 00:00:01NA 0.5281055 0.4533342
2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
2000-01-01 00:00:04NANA 0.1029247

# CODE:

set.seed(123)
# set N to 5 to reproduce above data.frame
N - 1000
# set K to 3 to reproduce above data.frame
K - 10
X - data.frame(
  ID = rep(1:K, each = N),
  DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), K)),
  VALUE = runif(N*K), stringsAsFactors = FALSE)
X - X[sample(1:(N*K), N*K),]
X - X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),]
str(X)


xtsSplit - function(x)
{
  library(xts)
  x - xts(x[,c(ID,VALUE)], as.POSIXct(x[,DATE]))
  return(do.call(merge, split(x$VALUE,x$ID)))
}
xtsSplitTime - replicate(50,
  system.time(xtsSplit(X))[[1]])
median(xtsSplitTime)

xtsReshape - function(x)
{
  library(xts)
  x - reshape(x, idvar = DATE, timevar = ID, direction = wide)
  x - xts(x[,-1], as.POSIXct(x[,1]))
  return(x)
}
xtsReshapeTime - replicate(50,
  system.time(xtsReshape(X))[[1]])
median(xtsReshapeTime)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to speed up grouping time series, help please

2011-04-07 Thread Den Alpin
I found a faster implementation (by an order of magnitude from my
tests) than the one using xts, split, merge (from Joshua).
I report the two fastest solution below with code to generate a test
case; some work still to be done for columns order and naming,
Test case has grown from my previous post to get a more realistic timing.

Any comment or idea to further speed up multivariate time series
creation with classes xts or timeSeries starting from a data.frame
like the one reported here is welcome.

Best regards,
Den


a data.frame example (code below to generate it)

 ID                DATE     VALUE
14  3 2000-01-01 00:00:03 0.5726334
4   1 2000-01-01 00:00:03 0.8830174
1   1 2000-01-01 00:00:00 0.2875775
15  3 2000-01-01 00:00:04 0.1029247
11  3 2000-01-01 00:00:00 0.9568333
9   2 2000-01-01 00:00:03 0.5514350
7   2 2000-01-01 00:00:01 0.5281055
6   2 2000-01-01 00:00:00 0.0455565
12  3 2000-01-01 00:00:01 0.4533342
8   2 2000-01-01 00:00:02 0.8924190
3   1 2000-01-01 00:00:02 0.4089769
13  3 2000-01-01 00:00:02 0.6775706

And I want to get a timeSeries object or xts object like this:

                          1         2         3
2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
2000-01-01 00:00:01        NA 0.5281055 0.4533342
2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
2000-01-01 00:00:04        NA        NA 0.1029247

# CODE:

set.seed(123)
# set N to 5 to reproduce above data.frame
N - 1000
# set K to 3 to reproduce above data.frame
K - 10
X - data.frame(
 ID = rep(1:K, each = N),
 DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), K)),
 VALUE = runif(N*K), stringsAsFactors = FALSE)
X - X[sample(1:(N*K), N*K),]
X - X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),]
str(X)


xtsSplit - function(x)
{
 library(xts)
 x - xts(x[,c(ID,VALUE)], as.POSIXct(x[,DATE]))
 return(do.call(merge, split(x$VALUE,x$ID)))
}
xtsSplitTime - replicate(50,
 system.time(xtsSplit(X))[[1]])
median(xtsSplitTime)

xtsReshape - function(x)
{
 library(xts)
 x - reshape(x, idvar = DATE, timevar = ID, direction = wide)
 x - xts(x[,-1], as.POSIXct(x[,1]))
 return(x)
}
xtsReshapeTime - replicate(50,
 system.time(xtsReshape(X))[[1]])
median(xtsReshapeTime)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to speed up grouping time series, help please

2011-04-04 Thread Den Alpin
I retrieve for a few hundred times a group of time series (10-15 ts
with 1 values each), on every group I do some calculation, graphs
etc. I wonder if there is a faster method than what presented below to
get an appropriate timeseries object.

Making a query with RODBC for every group I get a data frame like this:

 X
   IDDATE VALUE
14  3 2000-01-01 00:00:03 0.5726334
4   1 2000-01-01 00:00:03 0.8830174
1   1 2000-01-01 00:00:00 0.2875775
15  3 2000-01-01 00:00:04 0.1029247
11  3 2000-01-01 00:00:00 0.9568333
9   2 2000-01-01 00:00:03 0.5514350
7   2 2000-01-01 00:00:01 0.5281055
6   2 2000-01-01 00:00:00 0.0455565
12  3 2000-01-01 00:00:01 0.4533342
8   2 2000-01-01 00:00:02 0.8924190
3   1 2000-01-01 00:00:02 0.4089769
13  3 2000-01-01 00:00:02 0.6775706

And I want to get a timeSeries object or xts object like this:

1 2 3
2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
2000-01-01 00:00:01NA 0.5281055 0.4533342
2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
2000-01-01 00:00:04NANA 0.1029247


Input data can be sorted or unsorted (the most complicated case is in
the example, unsorted and missing data) in the sense that I can sort
in query if I can take an advantage from this.

Some considerations:
- Xts is generally faster than timeSeries
- both accept a matrix so if I can create a matrix like the one
represented above and an array of characters representing dates faster
than what possible with xts:::cbind, for examole,I will have a faster
implementation (package data.table ?).
- create timeseries objects in multithread and then merge (package plyr ?)
- faster merge algorithms?

Below some code to generate the test case above:


set.seed(123)
N - 5 # number of observations
K - 3 # number of timeseries ID

X - data.frame(
  ID = rep(1:K, each = N),
  DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), K)),
  VALUE = runif(N*K), stringsAsFactors = FALSE)

X - X[sample(1:(N*K), N*K),] # sample observations to get random
order (optional)
X - X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),] # 20% missing

head(X, 15)

# use explicitly environments to avoid '-'
buildTimeSeriesFromDataFrame - function(x, env)
{
  {
if(exists(xx, envir = env)) # if exist variable xx in env cbind
  assign(xx,
cbind(get(xx, env), timeSeries(x$VALUE, x$DATE,
  format = '%Y-%m-%d %H:%M:%S',
  zone = 'GMT', units = as.character(x$ID[1]))),
envir = env)
else  # create xx in env
  assign(xx,
timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S',
  zone = 'GMT', units = as.character(x$ID[1])),
envir = env)

return(TRUE)
  }
}

# use package plyr, faster than 'by' function
tsDaply - function(...)
{
  library(plyr)
  e1 - new.env(parent = baseenv()) #create a new env
  res - daply(X, ID, buildTimeSeriesFromDataFrame,
  env = e1)
  return(get(xx, e1)) # return xx from env
}

##replicate 100 times
#Time03 - replicate(100,
#  system.time(tsDaply(X, X$ID))[[1]])
#median(Time03)

# result
tsDaply(X, X$ID)


Thanks in advance for any input, best regards,
Den

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to speed up grouping time series, help please

2011-04-04 Thread Den Alpin
I did some tests on Your and Gabor solutions, below my findings:

- Your solution is fast as my solution in xts (below) but MUCH MORE
READABLE, in particular I think your test should take into account xts
creation from the data.frame (see below);
- Gabor's solution with read.zoo is fast as xts but give an xts object
that has some problems with time zones.

Any better idea to speed up grouping time series?

Thanks!

Below few line of codes to test (I suggest to grow X size to get
better comparison results):
xtsSplit - function(x)
{
  x - xts(x[,c(ID,VALUE)], as.POSIXct(x[,DATE]))
  x - do.call(merge, split(x$VALUE,x$ID))
  return(x)
}

xtsSplitTime - replicate(100,
  system.time(xtsSplit(X))[[1]])
median(xtsTime)

zooReadTime - replicate(100,
 system.time(z - read.zoo(X, split = 1, index = 2, tz = ))[[1]])
median(zooReadTime)

And my (unreadable) solution:
library(xts)
buildXtsFromDataFrame - function(x, env)
{
  {
if(exists(xx, envir = env))
{
  VALUE - as.matrix(x$VALUE)
  colnames(VALUE) - as.character(x$ID[1])
  assign(xx,
cbind(get(xx, env), xts(VALUE,
  as.POSIXct(x$DATE, tz = GMT,
format = '%Y-%m-%d %H:%M:%S'),
  tzone = 'GMT')),
envir = env)
} else
{
  VALUE - as.matrix(x$VALUE)
  colnames(VALUE) - as.character(x$ID[1])
  assign(xx,
xts(VALUE, as.POSIXct(x$DATE, tz = GMT,
format = '%Y-%m-%d %H:%M:%S'),
  tzone = 'GMT'),
envir = env)
}
return(TRUE)
  }
}

xtsDaply - function(...)
{
  e1 - new.env(parent = baseenv())
  res - daply(X, ID, buildXtsFromDataFrame,
  env = e1)
  return(get(xx, e1))
}

Time04 - replicate(100,
  system.time(xtsDaply(X, X$ID))[[1]])




2011/4/4 Joshua Ulrich josh.m.ulr...@gmail.com:
 Hi Dan,

 On Mon, Apr 4, 2011 at 7:49 AM, Den Alpin den.al...@gmail.com wrote:
 I retrieve for a few hundred times a group of time series (10-15 ts
 with 1 values each), on every group I do some calculation, graphs
 etc. I wonder if there is a faster method than what presented below to
 get an appropriate timeseries object.

 Making a query with RODBC for every group I get a data frame like this:

 X
   ID                DATE     VALUE
 14  3 2000-01-01 00:00:03 0.5726334
 4   1 2000-01-01 00:00:03 0.8830174
 1   1 2000-01-01 00:00:00 0.2875775
 15  3 2000-01-01 00:00:04 0.1029247
 11  3 2000-01-01 00:00:00 0.9568333
 9   2 2000-01-01 00:00:03 0.5514350
 7   2 2000-01-01 00:00:01 0.5281055
 6   2 2000-01-01 00:00:00 0.0455565
 12  3 2000-01-01 00:00:01 0.4533342
 8   2 2000-01-01 00:00:02 0.8924190
 3   1 2000-01-01 00:00:02 0.4089769
 13  3 2000-01-01 00:00:02 0.6775706

 And I want to get a timeSeries object or xts object like this:

                            1         2         3
 2000-01-01 00:00:00 0.2875775 0.0455565 0.9568333
 2000-01-01 00:00:01        NA 0.5281055 0.4533342
 2000-01-01 00:00:02 0.4089769 0.8924190 0.6775706
 2000-01-01 00:00:03 0.8830174 0.5514350 0.5726334
 2000-01-01 00:00:04        NA        NA 0.1029247


 Input data can be sorted or unsorted (the most complicated case is in
 the example, unsorted and missing data) in the sense that I can sort
 in query if I can take an advantage from this.

 Some considerations:
 - Xts is generally faster than timeSeries
 - both accept a matrix so if I can create a matrix like the one
 represented above and an array of characters representing dates faster
 than what possible with xts:::cbind, for examole,I will have a faster
 implementation (package data.table ?).
 - create timeseries objects in multithread and then merge (package plyr ?)
 - faster merge algorithms?

 Below some code to generate the test case above:


 set.seed(123)
 N - 5 # number of observations
 K - 3 # number of timeseries ID

 X - data.frame(
  ID = rep(1:K, each = N),
  DATE = as.character(rep(as.POSIXct(2000-01-01, tz = GMT)+ 0:(N-1), K)),
  VALUE = runif(N*K), stringsAsFactors = FALSE)

 X - X[sample(1:(N*K), N*K),] # sample observations to get random
 order (optional)
 X - X[-(sample(1:nrow(X), floor(nrow(X)*0.2))),] # 20% missing

 head(X, 15)

 # use explicitly environments to avoid '-'
 buildTimeSeriesFromDataFrame - function(x, env)
 {
  {
    if(exists(xx, envir = env)) # if exist variable xx in env cbind
      assign(xx,
        cbind(get(xx, env), timeSeries(x$VALUE, x$DATE,
          format = '%Y-%m-%d %H:%M:%S',
          zone = 'GMT', units = as.character(x$ID[1]))),
        envir = env)
    else  # create xx in env
      assign(xx,
        timeSeries(x$VALUE, x$DATE, format = '%Y-%m-%d %H:%M:%S',
          zone = 'GMT', units = as.character(x$ID[1])),
        envir = env)

    return(TRUE)
  }
 }

 # use package plyr, faster than 'by' function
 tsDaply - function(...)
 {
  library(plyr)
  e1 - new.env(parent = baseenv()) #create a new env
  res - daply(X, ID, buildTimeSeriesFromDataFrame,
      env = e1)
  return(get(xx, e1)) # return xx from env
 }

 ##replicate 100 times
 #Time03 - replicate(100