Hi Nelissa

I hope the answers from Joshua and Milan helped you figure out a good
solution.

So basically in R you have two big parallel packages, the parallel/snow
described by Joshua, or the foreach one. If you would be to use foreach,
you would need probably a nested loop, which is described in detail in a
vignette:
http://stat.ethz.ch/CRAN/web/packages/foreach/vignettes/nested.pdf


I guess in any case you will want to be more specific in your loop
regarding what is returned, and select precisely the parameter you want
(instead of returning the whole object). In your case, function getTh()
will probably do the trick.

Now regarding the specificities of the threshold VECM. Unfortunately the lm
methods discussed by Joshua won't work here, and you cannot even use here
Joshua's trick to run multiple responses (a simple reason being that a VECM
is already a multiple response model). Threshold VECMs are really heavy
models, so do expect a very long time to run each model, so enormous time
to run 3000 models!

Do not hesitate to ask also on the dedicated mailing list:
ts...@googlegroups.com

best

Message: 36
Date: Mon, 18 Feb 2013 08:50:04 -0800
From: Joshua Wiley <jwiley.ps...@gmail.com>
To: Milan Bouchet-Valat <nalimi...@club.fr>
Cc: "r-help@r-project.org" <r-help@r-project.org>, "Jamora,     Nelissa"
        <nelissa.jam...@agr.uni-goettingen.de>
Subject: Re: [R] foreach loop, stata equivalent
Message-ID:
        <canz9z_ld_subcacw1xabbubqtds2kwu-9e2mn02f4g40h2u...@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

If you have the same set of predictors for multiple outcomes, you can
speed up the process considerably by taking advantage of the fact that
the design matrix is the same for multiple outcomes.  As an example:

set.seed(10)
y <- matrix(rnorm(10000 * 14), ncol = 14)
x <- matrix(rnorm(10000 * 2), ncol = 2)
system.time(res <- lapply(1:14, function(i) lm(y[, i] ~ x)))
##   user  system elapsed
##   0.34    0.00    0.34
system.time(res2 <- lm(y ~ x))
##   user  system elapsed
##   0.05    0.02    0.06

lm can accept a matrix as the dependent variable.  So if various
combinations of variables predict all 14 outcomes, do not fit 14 *
number of combinations of predictors separately, do them in chunks for
substantial speedups.

Finally, as long as you are not using factors or any other fancy
things, and can work with just raw data matrices, instead of using
lm(), which is a high level function, use lm.fit().  It is not
especially clever, just expects the design matrix and response matrix.
 It will not an intercept by default, so to your data column bind on a
vector of 1s.

system.time(res3 <- lm.fit(cbind(1, x), y))
##   user  system elapsed
##   0.02    0.00    0.01

These three methods produce identical results:

res[[1]]
## Call:
## lm(formula = y[, i] ~ x)
## Coefficients:
## (Intercept)           x1           x2
##  0.0014401   -0.0198232   -0.0005721
res2[[1]][, 1]
##   (Intercept)            x1            x2
##  0.0014401149 -0.0198232209 -0.0005720764
res3[[1]][, 1]
##            x1            x2            x3
##  0.0014401149 -0.0198232209 -0.0005720764

however, fit each response one at a time instead of taking advantage
of fitting multiple responses at once (so the design matrix can be
reused) and taking advantage of lower level functions when you have a
simple, specific, repetitive task takes the estimation time from .34
down to .01.  You would need 34 cores to achieve that by simply
throwing more hardware at the problem as opposed to using more
optimized code.  Of course you can do both, and probably get the
results pretty quickly.

Cheers,

Josh

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to