On Sun, Jan 9, 2011 at 9:18 PM, Roger Bivand <[email protected]> wrote: > On Sun, 9 Jan 2011, Markus Neteler wrote: ... >> http://cran.r-project.org/web/views/HighPerformanceComputing.html >> >> An openMP approach or likewise with implicit parallelism would be great >> since I cannot rewrite R... > > Markus, > > No implicit - one needs to use mechanisms in the snow package (using sockets > is easiest and faster than PVM or MPI, but isn't fault tolerant) or similar > to start R on the worker nodes, and to divide up the tiles or whatever that > can be parallelised into a list for execution.
Roger, thanks for your help. However, for above suggestion one needs to know data and algorithms (somewhat at least) - in this case I was "blindly" executing some calculations for Nikos. > Depending on the data > configuration, this can help or not (if all the nodes need memory for large > objects, then they crowd out the machine). It all depends on what is being > done. If a task is embarassingly parallelisable (like bootstrapping, or > kriging from few points to many tiles), it can be effective, but one needs > to think through all the implications and plan the work to suit the problem > at hand and the available hardware. I don't think that openMP can schedule > arbitrary job junks by itself either? I was thinking about the low level functions in R which are regularly called, less about individual extensions. For sure, openMP requires good code knowledge, I tried a bit together with Yann Chemin to parallelize i.atcorr some time ago. > References in: > > http://www.nhh.no/Admin/Public/DWSDownload.aspx?File=%2fFiles%2fFiler%2finstitutter%2fsam%2fDiscussion+papers%2f2010%2f25.pdf Thanks for this nice paper, I wasn't aware of it. > I can send the example script from the paper if it would be useful, as a > rough template for simplistic use of snow. > > But I've no idea whether this application is embarassingly parallelisable, I > suspect not, and that it is using dense matrix methods where sparse methods > might be possible - vegdist() returns a dist object, which is (n*(n-1))/2 in > size. It will then copy these and use them as matrices. To reduce memory > usage with big N, mrpp() should be rewritten to make dmat sparse over a > distance threshold. Internally, dmat is promoted to full dense > representation (n*n) here quite big. Why it is bloating in your case, I > don't know. Looking at mrpp(), it could be parallelised in: > > perms <- sapply(1:permutations, function(x) grouping[permuted.index(N, > strata = strata)]) > m.ds <- numeric(permutations) > m.ds <- apply(perms, 2, function(x) mrpp.perms(x, dmat, indls, > w)) > > by spreading permutation burden across nodes if (and only if) one cout avoid > copying dmat out to each node. It would need careful analysis. I guess that I have to leave that to the experts... Thanks for your advice, though, I hope it will be picked up from this list. thanks Markus -- Markus Neteler, PhD Fondazione Edmund Mach (FEM) - IASMA Research and Innovation Centre Department of Biodiversity and Molecular Ecology Head of GIS and Remote Sensing Unit Via E. Mach, 1 - 38010 S. Michele all'Adige (TN), Italy Web: http://gis.cri.fmach.it - http://grass.osgeo.org > Hope this helps, > > Roger > >> >> thanks >> Markus >> >> > > -- > Roger Bivand > Economic Geography Section, Department of Economics, Norwegian School of > Economics and Business Administration, Helleveien 30, N-5045 Bergen, > Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 > e-mail: [email protected] > > _______________________________________________ grass-stats mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-stats
