Incidentally, I was reflecting on this topic the other day and was wondering whether BiocParallel could have something like OpenMPParam() that sets the number of threads to some non-zero value via omp_set_num_threads(). This would provide a consistent framework through which users could control OpenMP behavior in suitably written functions.

One could even imagine having a composition design where a caller could assemble a BPPARAM object like:

bplapply(..., BPPARAM=OpenMPParam(SnowParam(5), 2))

which tells bplapply to spin up 5 workers where each worker is allowed to use up to 2 threads each. Implementation-wise, it would be a relatively simple matter of stuffing an extra set-up command into .composeTry; the nthread-setting code can be borrowed from ShortRead.

For context: I am planning on moving more parallelization in my packages into OpenMP to get around the overhead of the other backends. Forking is the only approach that is remotely fast enough, but the interaction of forks with the GC is too chaotic in memory-limited environments.

-A

On 5/25/21 10:39 AM, Martin Morgan wrote:
If the BAM files are each processed independently, and each processing task 
takes a while, then it is probably 'good enough' to use R-level parallel 
evaluation using BiocParallel (currently the recommendation for Bioconductor 
packages) or other evaluation framework. Also, presumably you will use Rhtslib, 
which provides C-level access to the hts library. This will requiring writing C 
/ C++ code to interface between R and the hts library, and will of course be a 
significant underataking.

It might be worth outlining in a bit more detail what your task is and how (not 
too much detail!) you've tried to implement this in Rsamtools.

Martin Morgan

On 5/24/21, 10:01 AM, "Bioc-devel on behalf of Oleksii Nikolaienko" 
<bioc-devel-boun...@r-project.org on behalf of oleksii.nikolaie...@gmail.com> wrote:

     Dear Bioc team,
     I'd like to ask for your advice on the parallelization within a Bioc
     package. Please point me to a better place if this mailing list is not
     appropriate.
     After a bit of thinking I decided that I'd like to parallelize processing
     at the level of C++ code. Would you strongly recommend not to and use an R
     approach instead (e.g. "future")?
     If parallel C++ is ok, what would be the best solution for all major OSs?
     My initial choice was OpenMP, but then it seems that Apple has something
     against it (https://mac.r-project.org/openmp/). My own dev environment is
     mostly Big Sur/ARM64, but I wouldn't want to drop its support anyway.

     (On the actual task: loading and specific processing of very large BAM
     files, ideally significantly faster than by means of Rsamtools as a 
backend)

     Best,
     Oleksii Nikolaienko

        [[alternative HTML version deleted]]

     _______________________________________________
     Bioc-devel@r-project.org mailing list
     https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to