On Sat, 3 Jan 2009, Duncan Murdoch wrote:
On 03/01/2009 1:37 PM, Ajay Shah wrote:
As for jit and Ra, that was immediate reaction too but I found that jit
does
not help on your example. But I concur fully with what Ben said --- use
the
tool that is appropriate for the task at hand. If your task is running
for
loops, Matlab does it faster and you have Matlab, well then you should by
all
means use Matlab.
A good chunk of statistical computation involves loops. We are all
happy R users. I was surprised to see that we are so far from matlab
in the crucial dimension of performance.
I don't know Matlab, but I think the thing that is slowing R down here is its
generality. When you write
a[i] <- a[i] + 1
in R, it could potentially change the meaning of a, [, <-, and + on each step
through the loop, so R looks them up again each time. I would guess that's
not possible in Matlab, or perhaps Matlab has an optimizer that can recognize
that in the context where the loop is being evaluated, those changes are
known not to happen.
R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables, including ones
like [<- that are assembled and looked up as strings on every call.
It *would* be possible to write such an optimizer for
R, and Luke Tierney's byte code compiler-in-progress might incorporate such a
thing.
The current byte code compiler available from my web site speeds this
(highly artificial) example by about a factor of 4. The experimental
byte code engine I am currently working on (and that can't yet do much
more than an example like this) speeds this up by a factor of
80. Whether that level of improvement (for toy examples like this)
will remain once the engine is more complete and whether a reasonable
compiler can optimize down to the assembly code I used remain to be
seen.
For the difference in timing on the vectorized versions, I'd guess that
Matlab uses a better compiler than gcc. It's also likely that R incorporates
some unnecessary testing even in a case like this, because it's easier to
maintain code that is obviously sane than it is to maintain code that may not
be. R has a budget which is likely several orders of magnitude smaller than
Mathworks has, so it makes sense to target our resources at more important
issues than making fast things run a bit faster.
Another possibility is optimization setting tht may be higher and/or
more processor specific than those used by R.
We do handle the case where both arguments to + are scalar (i.e. of
length 1) separately but I don't recall if we do so for the
vector/scalar case also -- I suspect not as that would make the code
less maintainable for not a very substantial gain.
luke
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: l...@stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.