Incidentally, I was reflecting on this topic the other day and was
wondering whether BiocParallel could have something like OpenMPParam()
that sets the number of threads to some non-zero value via
omp_set_num_threads(). This would provide a consistent framework through
which users could control OpenMP behavior in suitably written functions.
One could even imagine having a composition design where a caller could
assemble a BPPARAM object like:
bplapply(..., BPPARAM=OpenMPParam(SnowParam(5), 2))
which tells bplapply to spin up 5 workers where each worker is allowed
to use up to 2 threads each. Implementation-wise, it would be a
relatively simple matter of stuffing an extra set-up command into
.composeTry; the nthread-setting code can be borrowed from ShortRead.
For context: I am planning on moving more parallelization in my packages
into OpenMP to get around the overhead of the other backends. Forking is
the only approach that is remotely fast enough, but the interaction of
forks with the GC is too chaotic in memory-limited environments.
-A
On 5/25/21 10:39 AM, Martin Morgan wrote:
If the BAM files are each processed independently, and each processing task
takes a while, then it is probably 'good enough' to use R-level parallel
evaluation using BiocParallel (currently the recommendation for Bioconductor
packages) or other evaluation framework. Also, presumably you will use Rhtslib,
which provides C-level access to the hts library. This will requiring writing C
/ C++ code to interface between R and the hts library, and will of course be a
significant underataking.
It might be worth outlining in a bit more detail what your task is and how (not
too much detail!) you've tried to implement this in Rsamtools.
Martin Morgan
On 5/24/21, 10:01 AM, "Bioc-devel on behalf of Oleksii Nikolaienko"
<bioc-devel-boun...@r-project.org on behalf of oleksii.nikolaie...@gmail.com> wrote:
Dear Bioc team,
I'd like to ask for your advice on the parallelization within a Bioc
package. Please point me to a better place if this mailing list is not
appropriate.
After a bit of thinking I decided that I'd like to parallelize processing
at the level of C++ code. Would you strongly recommend not to and use an R
approach instead (e.g. "future")?
If parallel C++ is ok, what would be the best solution for all major OSs?
My initial choice was OpenMP, but then it seems that Apple has something
against it (https://mac.r-project.org/openmp/). My own dev environment is
mostly Big Sur/ARM64, but I wouldn't want to drop its support anyway.
(On the actual task: loading and specific processing of very large BAM
files, ideally significantly faster than by means of Rsamtools as a
backend)
Best,
Oleksii Nikolaienko
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel