On Fri, Jun 5, 2009 at 4:09 AM, Allan Engelhardt all...@cybaea.com wrote:
I'm all done now. The max2 version below is what I went with in the end
for my proposed change to caret::nearZeroVar (which used the sort method).
Max Kuhn will make it available on CRAN soon. It speeds up that routine by
a factor 2-5 on my test cases and uses much less memory.
You can save a little in max2 like this:
max2a = {w-which.max(x); x[w]/max(x[-w], na.rm=TRUE);}
If you don't need to handle NA's (or if you know a priori how many there
are), you can also speed up part:
parta = {sel - length(x)+c(-1,0); a-sort.int(x, partial=sel,
na.last=NA)[2:1]; a[1]/a[2];}
which becomes about as fast as max2.
library(rbenchmark)
set.seed(1); x - runif(1e7, max=1e8);
benchmark(
replications=20,
columns=c(test,elapsed),
order=elapsed
, sort = {a-sort(x, decreasing=TRUE, na.last=NA)[1:2];
a[1]/a[2];}
, qsrt = {a-sort(x, decreasing=TRUE, na.last=NA, method=quick)[1:2];
a[1]/a[2];}
, part = {a-sort.int(-x, partial=1:2, na.last=NA)[1:2];
a[1]/a[2];}
, parta = {end-length(x)+c(-1,0);
a-sort.int(x, partial=end, na.last=FALSE)[end];
a[1]/a[2]; }
, max1 = {m-max(x, na.rm=TRUE);
w-which(x==m)[1];
m/max(x[-w],na.rm=TRUE);}
, max2 = {w-which.max(x);
max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
, max2a = {w-which.max(x);
x[w]/max(x[-w], na.rm=TRUE);}
)
test elapsed
7 max2a7.80
6 max28.94
4 parta9.05
3 part 10.72
5 max1 20.21
2 qsrt 49.33
1 sort 94.18
For what it is worth, I also made a C version (cmax below) which of
course is faster yet again and scales nicely for returning the top n values
of the array:
cmax - function (v) {max - vector(double,2); max - .C(test,
as.double(v), as.integer(length(v)), max, NAOK=TRUE)[[3]];
return(max[1]/max[2]);}
library(rbenchmark)
set.seed(1); x - runif(1e7, max=1e8); x[1] - NA;
benchmark(
replications=20,
columns=c(test,elapsed),
order=elapsed
, sort = {a-sort(x, decreasing=TRUE, na.last=NA)[1:2]; a[1]/a[2];}
, qsrt = {a-sort(x, decreasing=TRUE, na.last=NA, method=quick)[1:2];
a[1]/a[2];}
, part = {a-sort.int(-x, partial=1:2, na.last=NA)[1:2]; a[1]/a[2];}
, max1 = {m-max(x, na.rm=TRUE); w-which(x==m)[1];
m/max(x[-w],na.rm=TRUE);}
, max2 = {w-which.max(x); max(x, na.rm=TRUE)/max(x[-w], na.rm=TRUE);}
, cmax = {cmax(x);}
)
# test elapsed
# 6 cmax 4.394
# 5 max2 8.954
# 4 max1 18.835
# 3 part 21.749
# 2 qsrt 46.692
# 1 sort 77.679
Thanks for all the suggestions and comments.
Allan.
PS: Slightly off-topic but is there a way within the syntax of R to set up
things so that 'sort' (or any function) would know it is called in a partial
list context in sort(x)[1:2] and it therefore could choose to use the
partial argument automatically for small [] lists? The R interpreter of
course knows full well that it is going to drop all but the first two values
of the result before it calls 'sort'. Perl has 'use Want' where howmany()
and want(n) provides a subset of this functionality (essentially for []
lists of the form 1:n).
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.