On Sat, 3 Jan 2009, Duncan Murdoch wrote:

On 03/01/2009 1:37 PM, Ajay Shah wrote:
As for jit and Ra, that was immediate reaction too but I found that jit does not help on your example. But I concur fully with what Ben said --- use the tool that is appropriate for the task at hand. If your task is running for loops, Matlab does it faster and you have Matlab, well then you should by all
means use Matlab.

A good chunk of statistical computation involves loops. We are all
happy R users. I was surprised to see that we are so far from matlab
in the crucial dimension of performance.


I don't know Matlab, but I think the thing that is slowing R down here is its generality. When you write

a[i] <- a[i] + 1

in R, it could potentially change the meaning of a, [, <-, and + on each step through the loop, so R looks them up again each time. I would guess that's not possible in Matlab, or perhaps Matlab has an optimizer that can recognize that in the context where the loop is being evaluated, those changes are known not to happen.

R's interpreter is fairly slow due in large part to the allocation of
argument lists and the cost of lookups of variables, including ones
like [<- that are assembled and looked up as strings on every call.

It *would* be possible to write such an optimizer for R, and Luke Tierney's byte code compiler-in-progress might incorporate such a thing.

The current byte code compiler available from my web site speeds this
(highly artificial) example by about a factor of 4.  The experimental
byte code engine I am currently working on (and that can't yet do much
more than an example like this) speeds this up by a factor of
80. Whether that level of improvement (for toy examples like this)
will remain once the engine is more complete and whether a reasonable
compiler can optimize down to the assembly code I used remain to be
seen.

For the difference in timing on the vectorized versions, I'd guess that Matlab uses a better compiler than gcc. It's also likely that R incorporates some unnecessary testing even in a case like this, because it's easier to maintain code that is obviously sane than it is to maintain code that may not be. R has a budget which is likely several orders of magnitude smaller than Mathworks has, so it makes sense to target our resources at more important issues than making fast things run a bit faster.

Another possibility is optimization setting tht may be higher and/or
more processor specific than those used by R.

We do handle the case where both arguments to + are scalar (i.e. of
length 1) separately but I don't recall if we do so for the
vector/scalar case also -- I suspect not as that would make the code
less maintainable for not a very substantial gain.

luke

Duncan Murdoch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
   Actuarial Science
241 Schaeffer Hall                  email:      l...@stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to