Re: [R] error with more 100 forked processes

2022-04-08 Thread Henrik Bengtsson
The reason why you hit the limit already around 100 workers, could be
because you already have other connections open, e.g. file
connections, capture.output(), etc.

If you want to use *forked* processing with more than 125 workers
using bare-bone R, you can use parallel::mclapply() and friends,
because they don't use sockets connections to communicate between the
main process and the workers.

If you don't need *forked* processing per se, there are other
alternatives, as already pointed out above.

As the author of the future framework (https://www.futureverse.org/),
I obviously suggest you try that one. It's on CRAN and installs out of
the box on all OSes. You get several alternatives for parallel
backends. For *forked* processing, call plan(multicore) on top of your
script, and it'll parallelize via the parallel::mclapply() framework
internally, so you won't have the connection limitation to worry
about(*). You can also use plan(future.callr::callr) to parallelize
via the callr package, which also don't have the connection
limitation. Your code will be the same regardless which you end up
using.  For the front end, there's future.apply::future_lapply() et
al. (parallel version of base lapply functions), furrr::future_map()
et al. (parallel version of purrr's map functions), foreach w/
doFuture if you like the y <- foreach(...) %dopar% { ... } style.

(*) But there are other issues with forked processing, e.g. it might
not be compatible with multi-threaded code used by some packages. This
is a problem independent of futures per se.

Hope this helps

Henrik

On Fri, Apr 8, 2022 at 2:19 PM Ivan Krylov  wrote:
>
> On Fri, 8 Apr 2022 22:02:25 +0200
> Guido Kraemer via R-help  wrote:
>
> >  > cl <- makeForkCluster(128)
> > Error in UseMethod("sendData") :
> >no applicable method for 'sendData' applied to an object of class
> > "NULL"
>
> In order to communicate with the workers, R creates connection objects.
> Unfortunately, the memory for connection objects in R has a
> statically-defined limit of 128. (A few connections are used by
> default, and a few more will likely be used by user code during the
> actual program run.)
>
> Try increasing the limit in #define NCONNECTIONS in
> src/main/connections.c and re-compiling R.
>
> See also: https://github.com/HenrikBengtsson/Wishlist-for-R/issues/28
> According to Henrik Bengtsson, R should work well even with as many
> as 16381 possible connections, but then you may run into OS limits on
> file descriptors.
>
>
> --
> Best regards,
> Ivan
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error with more 100 forked processes

2022-04-08 Thread Guido Kraemer via R-help
I am trying to run a parallel job on a computer with many CPUs and get 
the following error:


> library(parallel)
> cl <- makeForkCluster(128)
Error in UseMethod("sendData") :
  no applicable method for 'sendData' applied to an object of class "NULL"

If I scale down to 100 CPUs it doesn't produce an error. I can reproduce 
this with a self compiled R 4.1.3 on Ubuntu 20.04 and Manjaro, as well 
as the R binaries that come with both distributions.



--
Guido Kraemer
Max Planck Institute for Biogeochemistry Jena
Department for Biogeochemical Integration
Hans-Knöll-Str. 10
07745 Jena
Germany

phone: +49 3641 576293
e-mail: gkrae...@bgc-jena.mpg.de

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.