On Fri, 26 Dec 2008, Bert Gunter wrote:

Thankyou for the clarification, Brian. This is very helpful (as usual).

However, I think the important point, which I misstated, is that whether it
be for() or, e.g. lapply(), the "loop" contents must be evaluated at the
interpreted R level, and this is where most time is typically spent. To get
the speedup that most people hope for, avoiding the loop altogether (i.e.
moving loop **and** evaluations) to C level, via R programming -- e.g. via
use of matrix operations, indexing, or built-in .Internal functions, etc. --
is the key.

Please correct me if I'm (even partially) wrong. As you know, the issue
arises frequently.

'Typically' is not the whole story.  In a loop like

Y <- double(length(X))
for(i in seq_along(X)) Y[i] <- fun(X[i])

quite a lot of time and memory may be spent in re-allocating Y at each
step of the loop, and lapply() is able to avoid that.  E.g.

X <- runif(1e6)
system.time({
Y <- double(length(X))
for(i in seq_along(X)) Y[i] <- sin(X[i])
})

takes 5.2 secs vs unlist(lapply(X, sin)) which takes 1.5. Of course, using the vectorized function sin() takes 0.05 sec. If you use sapply you will lose all the gain.

This is not a typical example, but it arises often enough to make it worthwhile having an optimized lapply().


-- Bert Gunter
Genentech

-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Prof Brian Ripley
Sent: Friday, December 26, 2008 12:44 AM
To: Oliver Bandel
Cc: r-h...@stat.math.ethz.ch
Subject: Re: [R] How can I avoid nested 'for' loops or quicken the process?

On Thu, 25 Dec 2008, Oliver Bandel wrote:

Bert Gunter <gunter.berton <at> gene.com> writes:


FWIW:

Good advice below! -- after all, the first rule of optimizing code is:
Don't!

For the record (yet again), the apply() family of functions (and their
packaged derivatives, of course) are "merely" vary carefully written
for()
loops: their main advantage is in code readability, not in efficiency
gains,
which may well be small or nonexistent. True efficiency gains require
"vectorization", which essentially moves the for() loops from interpreted
code to (underlying) C code (on the underlying data structures): e.g.
compare rowMeans() [vectorized] with ave() or apply(..,1,mean).
[...]

The apply-functions do bring speed-advantages.

This is not only what I read about it,
I have used the apply-functions and really got
results faster.

The reason is simple: an apply-function does
make in C, what otherwise would be done on the level of R
with for-loops.

Not true of apply(): true of lapply() and hence sapply().  I'll leave you
to check eapply, mapply, rapply, tapply.

So the issue is what is meant by 'the apply() family of functions': people
often mean *apply(), of which apply() is an unusual member, if one at all.

[Historical note: a decade ago lapply was internally a for() loop.  I
rewrote it in C in 2000: I also moved apply to C at the same time but it
proved too little an advantage and was reverted.  The speed of lapply
comes mainly from reduced memory allocation: for() is also written in C.]

--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to