subject:"Re\: \[R\] dist like function but where you can configure the method"

Re: [R] dist like function but where you can configure the method

2014-05-17 Thread David L Carlson

Function designdist() in package vegan lets you define your own distance 
measure, but it does not let you simply provide a function as your original 
request indicated. Function distance() in package ecodist() indicates that it 
is written to make it simple to add new distance functions, but warns that it 
is not efficient for large matrices.

David Carlson

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Witold E Wolski
Sent: Friday, May 16, 2014 3:00 PM
To: Rui Barradas
Cc: Jari Oksanen; r-h...@stat.math.ethz.ch; Barry Rowlingson
Subject: Re: [R] dist like function but where you can configure the method

Ouch,

First : my question was not how to implement dist but if there is a more 
generic dist function than stats:dist.

Secondly: ks.test is ment as a placeholder (see the comment in the code I did 
send) for any other function taking two vector arguments.

Third: I do subscribe to the idea that a function call is easier to read and 
understand than a for loop. @Bert apply is a native C function and the loop is 
not interpreted AFAIK

@Rui @Barry @Jari What do you benchmark? an empty loop?

Look at the trivial benchmarks below: _apply_ clearly outperforms a for loop in 
R , It always has, it outperforms even an empty for

# an empty unrealistic for loop as suggested by Rui , Barry and Jari
f1 - function(n){
  for(i in 1:n){
for(j in 1:n){
}
  }}


myfunc = function(x,y=x){x-y}

# a for loop which does actually something
f2 - function(n){
  mm - matrix(0,ncol=n,nrow=n)
  for(i in 1:n){
for(j in 1:n){
  mm[i,j] = myfunc(i,j)
}
  }
  return(mm)
}

# and array
f3 = function(n){
  res = rep(0,n*n)
  for(i in 1:(n*n))
  {
res[i] = myfunc(i)
  }
}


n = 1000
system.time(f1(n))
system.time(f2(n))
system.time(f3(n))
system.time(apply(t(1:(n*n)),1,myfunc))


 system.time(f1(n))
   User  System verstrichen
   0.280.000.28
 system.time(f2(n))
   User  System verstrichen
   6.800.007.09
 system.time(f3(n))
   User  System verstrichen
   5.830.005.98
 system.time(apply(t(1:(n*n)),1,myfunc))
   User  System verstrichen
   0.190.000.19






On 16 May 2014 20:55, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,

 The compiler package is good at speeding up for loops but in this case 
 the gain is neglectable. The ks test is the real time problem.

 library(compiler)

 f1 - function(n){

 for(i in 1:100){
 for(i in 1:100){
 ks.test(runif(100),runif(100))
 }
 }
 }

 f1.c - cmpfun(f1)

 system.time(f1())
user  system elapsed
3.500.003.53
 system.time(f1.c())
user  system elapsed
3.470.003.48


 Rui Barradas

 Em 16-05-2014 17:12, Barry Rowlingson escreveu:

 On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski wewol...@gmail.com
 wrote:

 Dear Jari,

 Thanks for your reply...

 The overhead would be
 2 for loops
 for(i in 1:dim(x)[2])
 for(j in i:dim(x)[2])

 isn't it? Or are you seeing a different way to implement it?

 A for loop is pretty expensive in R. Therefore I am looking for an 
 implementation similar to apply or lapply were the iteration is made 
 in native code.


 No, a for loop is not pretty expensive in R -- at least not compared 
 to doing a k-s test:

system.time(for(i in 1:1){ks.test(runif(100),runif(100))})
 user  system elapsed
3.680   0.012   3.697

   3.68 seconds to do 1 ks tests (and generate 200 runifs)

system.time(for(i in 1:1){})
 user  system elapsed
0.000   0.000   0.001

   0.000s time to do 1 loops. Oh lets nest it for fun:

system.time(for(i in 1:100){for(i in
 1:100){ks.test(runif(100),runif(100))}})
 user  system elapsed
3.692   0.004   3.701

   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.

 Moral: don't worry about the for loops.

 Barry

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





--
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

2014-05-16 Thread Jari Oksanen

Witold E Wolski wewolski at gmail.com writes:

 
 Looking for an  fast dist implementation
 where I could pass my own dist function to the method parameter
 
 i.e.
 
 mydistfun = function(x,y){
  return(ks.test(x,y)$p.value)   #some mystique implementation
 }
 
 wow = dist(data,method=mydistfun)

I think it is best to write that function yourself.

The dist object is a vector corresponding to a lower triangle
(without the diagonal) of a symmetric matrix and with attributes.
The attributes are class which should be c(mydist, dist), Size
which is the length(x), Labels (optional) which are the 
names of your items and if given, should have length(x), 
call = match.call(), Diag = FALSE, Upper = FALSE and method name.
All you need is a vector with attributes.

All this will add very little overhead to your calculation, so
for all practical purposes this implementation is just as fast as 
is your mystique implementation of pairwise distances. Your
example (ks.test()) probably would be pretty slow. If you can
vectorize your distance, it can be really fast, even if you 
calculate the full symmetric matrix and throw away the diagonal and
upper triangle.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

2014-05-16 Thread Witold E Wolski

Dear Jari,

Thanks for your reply...

The overhead would be
2 for loops
for(i in 1:dim(x)[2])
for(j in i:dim(x)[2])

isn't it? Or are you seeing a different way to implement it?

A for loop is pretty expensive in R. Therefore I am looking for an
implementation similar to apply or lapply were the iteration is made
in native code.





On 16 May 2014 15:57, Jari Oksanen jari.oksa...@oulu.fi wrote:
 Witold E Wolski wewolski at gmail.com writes:


 Looking for an  fast dist implementation
 where I could pass my own dist function to the method parameter

 i.e.

 mydistfun = function(x,y){
  return(ks.test(x,y)$p.value)   #some mystique implementation
 }

 wow = dist(data,method=mydistfun)

 I think it is best to write that function yourself.

 The dist object is a vector corresponding to a lower triangle
 (without the diagonal) of a symmetric matrix and with attributes.
 The attributes are class which should be c(mydist, dist), Size
 which is the length(x), Labels (optional) which are the
 names of your items and if given, should have length(x),
 call = match.call(), Diag = FALSE, Upper = FALSE and method name.
 All you need is a vector with attributes.

 All this will add very little overhead to your calculation, so
 for all practical purposes this implementation is just as fast as
 is your mystique implementation of pairwise distances. Your
 example (ks.test()) probably would be pretty slow. If you can
 vectorize your distance, it can be really fast, even if you
 calculate the full symmetric matrix and throw away the diagonal and
 upper triangle.

 Cheers, Jari Oksanen

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

2014-05-16 Thread Barry Rowlingson

On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski wewol...@gmail.com wrote:
 Dear Jari,

 Thanks for your reply...

 The overhead would be
 2 for loops
 for(i in 1:dim(x)[2])
 for(j in i:dim(x)[2])

 isn't it? Or are you seeing a different way to implement it?

 A for loop is pretty expensive in R. Therefore I am looking for an
 implementation similar to apply or lapply were the iteration is made
 in native code.

No, a for loop is not pretty expensive in R -- at least not compared
to doing a k-s test:

  system.time(for(i in 1:1){ks.test(runif(100),runif(100))})
   user  system elapsed
  3.680   0.012   3.697

 3.68 seconds to do 1 ks tests (and generate 200 runifs)

  system.time(for(i in 1:1){})
   user  system elapsed
  0.000   0.000   0.001

 0.000s time to do 1 loops. Oh lets nest it for fun:

  system.time(for(i in 1:100){for(i in 1:100){ks.test(runif(100),runif(100))}})
   user  system elapsed
  3.692   0.004   3.701

 no different. Even a ks-test with only 5 items is taking me 2.2 seconds.

Moral: don't worry about the for loops.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

2014-05-16 Thread Bert Gunter

Yes, ... and further

apply-type functions still have to loop at the interpreter level, and
generally take about the same time as their translation to for loops
(with suitable caveats for this kind of vague assertion). Their chief
advantage is readability and adherence to R's functional paradigm
(again with suitable caveats).

Alternatively, byte code compilation with the compiler package **may**
(significantly) improve speed, but it very much depends ...

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Fri, May 16, 2014 at 9:12 AM, Barry Rowlingson
b.rowling...@lancaster.ac.uk wrote:
 On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski wewol...@gmail.com wrote:
 Dear Jari,

 Thanks for your reply...

 The overhead would be
 2 for loops
 for(i in 1:dim(x)[2])
 for(j in i:dim(x)[2])

 isn't it? Or are you seeing a different way to implement it?

 A for loop is pretty expensive in R. Therefore I am looking for an
 implementation similar to apply or lapply were the iteration is made
 in native code.

 No, a for loop is not pretty expensive in R -- at least not compared
 to doing a k-s test:

   system.time(for(i in 1:1){ks.test(runif(100),runif(100))})
user  system elapsed
   3.680   0.012   3.697

  3.68 seconds to do 1 ks tests (and generate 200 runifs)

   system.time(for(i in 1:1){})
user  system elapsed
   0.000   0.000   0.001

  0.000s time to do 1 loops. Oh lets nest it for fun:

   system.time(for(i in 1:100){for(i in 
 1:100){ks.test(runif(100),runif(100))}})
user  system elapsed
   3.692   0.004   3.701

  no different. Even a ks-test with only 5 items is taking me 2.2 seconds.

 Moral: don't worry about the for loops.

 Barry

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

2014-05-16 Thread Jari Oksanen

I did not regard the loops as the overhead but a part of the process. Overhead 
is setting attributes. The loop is not so very expensive compared to ks.test(). 
You can always replace the loop with an apply on the vector of indices, but 
about the only way to speed up calculations is to use parallel processing 
(parLapply, parSapply, parRapply functions of the parallel processing.

I wrote about vectorization: that would be faster, but it cannot be done 
blindly to just any function, but you must deconstruct the function to see if 
it can decomposed into operations of vectors. In vegan:::designdist we do that 
for some function types, but you really must *think* about the function you are 
using to know if you can write it in vectorized form. It is not automatic.

Cheers, Jari Oksanen
On 16/05/2014, at 18:46 PM, Witold E Wolski wrote:

 Dear Jari,
 
 Thanks for your reply...
 
 The overhead would be
 2 for loops
 for(i in 1:dim(x)[2])
 for(j in i:dim(x)[2])
 
 isn't it? Or are you seeing a different way to implement it?
 
 A for loop is pretty expensive in R. Therefore I am looking for an
 implementation similar to apply or lapply were the iteration is made
 in native code.
 
 
 
 
 
 On 16 May 2014 15:57, Jari Oksanen jari.oksa...@oulu.fi wrote:
 Witold E Wolski wewolski at gmail.com writes:
 
 
 Looking for an  fast dist implementation
 where I could pass my own dist function to the method parameter
 
 i.e.
 
 mydistfun = function(x,y){
 return(ks.test(x,y)$p.value)   #some mystique implementation
 }
 
 wow = dist(data,method=mydistfun)
 
 I think it is best to write that function yourself.
 
 The dist object is a vector corresponding to a lower triangle
 (without the diagonal) of a symmetric matrix and with attributes.
 The attributes are class which should be c(mydist, dist), Size
 which is the length(x), Labels (optional) which are the
 names of your items and if given, should have length(x),
 call = match.call(), Diag = FALSE, Upper = FALSE and method name.
 All you need is a vector with attributes.
 
 All this will add very little overhead to your calculation, so
 for all practical purposes this implementation is just as fast as
 is your mystique implementation of pairwise distances. Your
 example (ks.test()) probably would be pretty slow. If you can
 vectorize your distance, it can be really fast, even if you
 calculate the full symmetric matrix and throw away the diagonal and
 upper triangle.
 
 Cheers, Jari Oksanen
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

2014-05-16 Thread Rui Barradas


Hello,

The compiler package is good at speeding up for loops but in this case 
the gain is neglectable. The ks test is the real time problem.


library(compiler)

f1 - function(n){
for(i in 1:100){
for(i in 1:100){
ks.test(runif(100),runif(100))
}
}
}

f1.c - cmpfun(f1)

system.time(f1())
   user  system elapsed
   3.500.003.53
system.time(f1.c())
   user  system elapsed
   3.470.003.48


Rui Barradas

Em 16-05-2014 17:12, Barry Rowlingson escreveu:

On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski wewol...@gmail.com wrote:

Dear Jari,

Thanks for your reply...

The overhead would be
2 for loops
for(i in 1:dim(x)[2])
for(j in i:dim(x)[2])

isn't it? Or are you seeing a different way to implement it?

A for loop is pretty expensive in R. Therefore I am looking for an
implementation similar to apply or lapply were the iteration is made
in native code.


No, a for loop is not pretty expensive in R -- at least not compared
to doing a k-s test:

   system.time(for(i in 1:1){ks.test(runif(100),runif(100))})
user  system elapsed
   3.680   0.012   3.697

  3.68 seconds to do 1 ks tests (and generate 200 runifs)

   system.time(for(i in 1:1){})
user  system elapsed
   0.000   0.000   0.001

  0.000s time to do 1 loops. Oh lets nest it for fun:

   system.time(for(i in 1:100){for(i in 
1:100){ks.test(runif(100),runif(100))}})
user  system elapsed
   3.692   0.004   3.701

  no different. Even a ks-test with only 5 items is taking me 2.2 seconds.

Moral: don't worry about the for loops.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

2014-05-16 Thread Witold E Wolski

Ouch,

First : my question was not how to implement dist but if there is a
more generic dist function than stats:dist.

Secondly: ks.test is ment as a placeholder (see the comment in the
code I did send) for any other function taking two vector arguments.

Third: I do subscribe to the idea that a function call is easier to
read and understand than a for loop. @Bert apply is a native C
function and the loop is not interpreted AFAIK

@Rui @Barry @Jari What do you benchmark? an empty loop?

Look at the trivial benchmarks below: _apply_ clearly outperforms a
for loop in R , It always has, it outperforms even an empty for

# an empty unrealistic for loop as suggested by Rui , Barry and Jari
f1 - function(n){
  for(i in 1:n){
for(j in 1:n){
}
  }}


myfunc = function(x,y=x){x-y}

# a for loop which does actually something
f2 - function(n){
  mm - matrix(0,ncol=n,nrow=n)
  for(i in 1:n){
for(j in 1:n){
  mm[i,j] = myfunc(i,j)
}
  }
  return(mm)
}

# and array
f3 = function(n){
  res = rep(0,n*n)
  for(i in 1:(n*n))
  {
res[i] = myfunc(i)
  }
}


n = 1000
system.time(f1(n))
system.time(f2(n))
system.time(f3(n))
system.time(apply(t(1:(n*n)),1,myfunc))


 system.time(f1(n))
   User  System verstrichen
   0.280.000.28
 system.time(f2(n))
   User  System verstrichen
   6.800.007.09
 system.time(f3(n))
   User  System verstrichen
   5.830.005.98
 system.time(apply(t(1:(n*n)),1,myfunc))
   User  System verstrichen
   0.190.000.19






On 16 May 2014 20:55, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,

 The compiler package is good at speeding up for loops but in this case the
 gain is neglectable. The ks test is the real time problem.

 library(compiler)

 f1 - function(n){

 for(i in 1:100){
 for(i in 1:100){
 ks.test(runif(100),runif(100))
 }
 }
 }

 f1.c - cmpfun(f1)

 system.time(f1())
user  system elapsed
3.500.003.53
 system.time(f1.c())
user  system elapsed
3.470.003.48


 Rui Barradas

 Em 16-05-2014 17:12, Barry Rowlingson escreveu:

 On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski wewol...@gmail.com
 wrote:

 Dear Jari,

 Thanks for your reply...

 The overhead would be
 2 for loops
 for(i in 1:dim(x)[2])
 for(j in i:dim(x)[2])

 isn't it? Or are you seeing a different way to implement it?

 A for loop is pretty expensive in R. Therefore I am looking for an
 implementation similar to apply or lapply were the iteration is made
 in native code.


 No, a for loop is not pretty expensive in R -- at least not compared
 to doing a k-s test:

system.time(for(i in 1:1){ks.test(runif(100),runif(100))})
 user  system elapsed
3.680   0.012   3.697

   3.68 seconds to do 1 ks tests (and generate 200 runifs)

system.time(for(i in 1:1){})
 user  system elapsed
0.000   0.000   0.001

   0.000s time to do 1 loops. Oh lets nest it for fun:

system.time(for(i in 1:100){for(i in
 1:100){ks.test(runif(100),runif(100))}})
 user  system elapsed
3.692   0.004   3.701

   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.

 Moral: don't worry about the for loops.

 Barry

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

2014-05-16 Thread William Dunlap

 system.time(apply(t(1:(n*n)),1,myfunc))
User  System verstrichen
0.190.000.19

That calls 'myfunc' exactly once:

 system.time(apply(t(1:(3*3)), 1, print))
[1] 1 2 3 4 5 6 7 8 9
   user  system elapsed
  0   0   0


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, May 16, 2014 at 1:00 PM, Witold E Wolski wewol...@gmail.com wrote:
 Ouch,

 First : my question was not how to implement dist but if there is a
 more generic dist function than stats:dist.

 Secondly: ks.test is ment as a placeholder (see the comment in the
 code I did send) for any other function taking two vector arguments.

 Third: I do subscribe to the idea that a function call is easier to
 read and understand than a for loop. @Bert apply is a native C
 function and the loop is not interpreted AFAIK

 @Rui @Barry @Jari What do you benchmark? an empty loop?

 Look at the trivial benchmarks below: _apply_ clearly outperforms a
 for loop in R , It always has, it outperforms even an empty for

 # an empty unrealistic for loop as suggested by Rui , Barry and Jari
 f1 - function(n){
   for(i in 1:n){
 for(j in 1:n){
 }
   }}


 myfunc = function(x,y=x){x-y}

 # a for loop which does actually something
 f2 - function(n){
   mm - matrix(0,ncol=n,nrow=n)
   for(i in 1:n){
 for(j in 1:n){
   mm[i,j] = myfunc(i,j)
 }
   }
   return(mm)
 }

 # and array
 f3 = function(n){
   res = rep(0,n*n)
   for(i in 1:(n*n))
   {
 res[i] = myfunc(i)
   }
 }


 n = 1000
 system.time(f1(n))
 system.time(f2(n))
 system.time(f3(n))
 system.time(apply(t(1:(n*n)),1,myfunc))


 system.time(f1(n))
User  System verstrichen
0.280.000.28
 system.time(f2(n))
User  System verstrichen
6.800.007.09
 system.time(f3(n))
User  System verstrichen
5.830.005.98
 system.time(apply(t(1:(n*n)),1,myfunc))
User  System verstrichen
0.190.000.19






 On 16 May 2014 20:55, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,

 The compiler package is good at speeding up for loops but in this case the
 gain is neglectable. The ks test is the real time problem.

 library(compiler)

 f1 - function(n){

 for(i in 1:100){
 for(i in 1:100){
 ks.test(runif(100),runif(100))
 }
 }
 }

 f1.c - cmpfun(f1)

 system.time(f1())
user  system elapsed
3.500.003.53
 system.time(f1.c())
user  system elapsed
3.470.003.48


 Rui Barradas

 Em 16-05-2014 17:12, Barry Rowlingson escreveu:

 On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski wewol...@gmail.com
 wrote:

 Dear Jari,

 Thanks for your reply...

 The overhead would be
 2 for loops
 for(i in 1:dim(x)[2])
 for(j in i:dim(x)[2])

 isn't it? Or are you seeing a different way to implement it?

 A for loop is pretty expensive in R. Therefore I am looking for an
 implementation similar to apply or lapply were the iteration is made
 in native code.


 No, a for loop is not pretty expensive in R -- at least not compared
 to doing a k-s test:

system.time(for(i in 1:1){ks.test(runif(100),runif(100))})
 user  system elapsed
3.680   0.012   3.697

   3.68 seconds to do 1 ks tests (and generate 200 runifs)

system.time(for(i in 1:1){})
 user  system elapsed
0.000   0.000   0.001

   0.000s time to do 1 loops. Oh lets nest it for fun:

system.time(for(i in 1:100){for(i in
 1:100){ks.test(runif(100),runif(100))}})
 user  system elapsed
3.692   0.004   3.701

   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.

 Moral: don't worry about the for loops.

 Barry

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





 --
 Witold Eryk Wolski

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

2014-05-16 Thread Bert Gunter

If the apply() call is not empty, its contents must of course be
interpreted. That's where the time goes.

system.time(for(i in 1:1e6)rnorm(1))
   user  system elapsed
   5.250.005.29

 system.time(lapply(1:1e6,rnorm,n=1))
   user  system elapsed
   9.640.019.72

 system.time(vapply(1:1e6,rnorm,FUN.VALUE=0,n=1))
   user  system elapsed
   5.690.005.73


I rest my case.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch




On Fri, May 16, 2014 at 1:00 PM, Witold E Wolski wewol...@gmail.com wrote:
 Ouch,

 First : my question was not how to implement dist but if there is a
 more generic dist function than stats:dist.

 Secondly: ks.test is ment as a placeholder (see the comment in the
 code I did send) for any other function taking two vector arguments.

 Third: I do subscribe to the idea that a function call is easier to
 read and understand than a for loop. @Bert apply is a native C
 function and the loop is not interpreted AFAIK

 @Rui @Barry @Jari What do you benchmark? an empty loop?

 Look at the trivial benchmarks below: _apply_ clearly outperforms a
 for loop in R , It always has, it outperforms even an empty for

 # an empty unrealistic for loop as suggested by Rui , Barry and Jari
 f1 - function(n){
   for(i in 1:n){
 for(j in 1:n){
 }
   }}


 myfunc = function(x,y=x){x-y}

 # a for loop which does actually something
 f2 - function(n){
   mm - matrix(0,ncol=n,nrow=n)
   for(i in 1:n){
 for(j in 1:n){
   mm[i,j] = myfunc(i,j)
 }
   }
   return(mm)
 }

 # and array
 f3 = function(n){
   res = rep(0,n*n)
   for(i in 1:(n*n))
   {
 res[i] = myfunc(i)
   }
 }


 n = 1000
 system.time(f1(n))
 system.time(f2(n))
 system.time(f3(n))
 system.time(apply(t(1:(n*n)),1,myfunc))


 system.time(f1(n))
User  System verstrichen
0.280.000.28
 system.time(f2(n))
User  System verstrichen
6.800.007.09
 system.time(f3(n))
User  System verstrichen
5.830.005.98
 system.time(apply(t(1:(n*n)),1,myfunc))
User  System verstrichen
0.190.000.19






 On 16 May 2014 20:55, Rui Barradas ruipbarra...@sapo.pt wrote:
 Hello,

 The compiler package is good at speeding up for loops but in this case the
 gain is neglectable. The ks test is the real time problem.

 library(compiler)

 f1 - function(n){

 for(i in 1:100){
 for(i in 1:100){
 ks.test(runif(100),runif(100))
 }
 }
 }

 f1.c - cmpfun(f1)

 system.time(f1())
user  system elapsed
3.500.003.53
 system.time(f1.c())
user  system elapsed
3.470.003.48


 Rui Barradas

 Em 16-05-2014 17:12, Barry Rowlingson escreveu:

 On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski wewol...@gmail.com
 wrote:

 Dear Jari,

 Thanks for your reply...

 The overhead would be
 2 for loops
 for(i in 1:dim(x)[2])
 for(j in i:dim(x)[2])

 isn't it? Or are you seeing a different way to implement it?

 A for loop is pretty expensive in R. Therefore I am looking for an
 implementation similar to apply or lapply were the iteration is made
 in native code.


 No, a for loop is not pretty expensive in R -- at least not compared
 to doing a k-s test:

system.time(for(i in 1:1){ks.test(runif(100),runif(100))})
 user  system elapsed
3.680   0.012   3.697

   3.68 seconds to do 1 ks tests (and generate 200 runifs)

system.time(for(i in 1:1){})
 user  system elapsed
0.000   0.000   0.001

   0.000s time to do 1 loops. Oh lets nest it for fun:

system.time(for(i in 1:100){for(i in
 1:100){ks.test(runif(100),runif(100))}})
 user  system elapsed
3.692   0.004   3.701

   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.

 Moral: don't worry about the for loops.

 Barry

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





 --
 Witold Eryk Wolski

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dist like function but where you can configure the method

Re: [R] dist like function but where you can configure the method

Re: [R] dist like function but where you can configure the method

Re: [R] dist like function but where you can configure the method

Re: [R] dist like function but where you can configure the method

Re: [R] dist like function but where you can configure the method

Re: [R] dist like function but where you can configure the method

Re: [R] dist like function but where you can configure the method

Re: [R] dist like function but where you can configure the method

Re: [R] dist like function but where you can configure the method

10 matches

Site Navigation

Mail list logo

Footer information