On 12/12/2016 06:00 AM, Dario Strbenac wrote:
Good day,
I found that readGAlignmentPairs fails when used inside an mclapply loop but
not an sapply loop. I haven't had such problems with other functions when using
mclapply.
class(mappedToGenomeFiles)
[1] "character"
length(mappedToGenomeFiles)
[1] 13
mappedReadsGenome <- sapply(mappedToGenomeFiles, function(bamFile)
{
readGAlignmentPairs(bamFile, strandMode = 2)
})
# No error. Each item is of GAlignmentPairs class.
But, with mclapply:
mappedReadsGenome <- mclapply(mappedToGenomeFiles, function(bamFile)
{
readGAlignmentPairs(bamFile, strandMode = 2)
}, mc.cores = 7)
Warning message:
In mclapply(mappedToGenomeFiles, function(bamFile) { :
scheduled cores 6, 5, 3, 1, 4, 2 encountered errors in user code, all values
of the jobs will be affected
mappedReadsGenome
[[1]]
[1] "fatal error in wrapper code"
attr(,"class")
[1] "try-error"
[[2]]
[1] "fatal error in wrapper code"
attr(,"class")
[1] "try-error"
if the return value is large, and R tries to serialize them, then it may
be that the size of the serialized vector is too large to be represented
in R -- you could try
length(serialize(readGAlignementPairs(bamFile, strandMode=2))))
to test whether this causes the error.
With parallel evaluation you generally want to minimize the amount of
data communicated (in both directions) between manager and worker. And
since workers are contending for memory on the same machine, you
generally want to adopt strategies like
bf = BamFile(yieldSize=1000000)
GenomicFiles::reduceByYield(bf, ...)
that iterate through the large object in moderate-sized chunks.
Martin
.
.
.
[[7]]
GAlignmentPairs object with 41860576 pairs, strandMode=2, and 0 metadata
columns:
seqnames strand : ranges --
ranges
<Rle> <Rle> : <IRanges> --
<IRanges>
[1] chr14 + : [19010525, 19010623] -- [19010414,
19010513]
[2] chr14 + : [19010543, 19010612] -- [19010505,
19010604]
[3] chr14 + : [19010608, 19010707] -- [19010577,
19010676]
[4] chr14 + : [19011187, 19011286] -- [19011142,
19011241]
[5] chr14 + : [19011318, 19011415] -- [19011187,
19011286]
... ... ... ... ... ...
...
[41860572] chr4 + : [190972787, 190972886] -- [190972685,
190972784]
[41860573] chr4 - : [190974302, 190974385] -- [190974302,
190974385]
[41860574] chr4 - : [190978480, 190978579] -- [190978542,
190978641]
[41860575] chr4 - : [190982116, 190982215] -- [190982125,
190982224]
[41860576] chr4 + : [191031678, 191031776] -- [191031630,
191031729]
-------
seqinfo: 25 sequences from an unspecified genome
.
.
.
[[13]]
[1] "fatal error in wrapper code"
attr(,"class")
[1] "try-error"
Interestingly, reading in from one of the thirteen file paths worked.
In contrast, a simple test case of the same length works:
X=1:13
mclapply(X, function(x) x + 1, mc.cores = 7) # Prints 2:14.
The BAM file import also works with blapply and BPPARAM =
MulticoreParam(workers = 7)
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] GenomicAlignments_1.10.0 SummarizedExperiment_1.4.0
GenomicFeatures_1.26.0 AnnotationDbi_1.36.0 Biobase_2.34.0
[6] Rsamtools_1.26.1 Biostrings_2.42.0 XVector_0.14.0
GenomicRanges_1.26.1 GenomeInfoDb_1.10.1
[11] IRanges_2.8.1 S4Vectors_0.12.0 BiocGenerics_0.20.0
loaded via a namespace (and not attached):
[1] zlibbioc_1.20.0 BiocParallel_1.8.1 lattice_0.20-34 tools_3.3.2
grid_3.3.2 DBI_0.5-1 Matrix_1.2-7.1
[8] rtracklayer_1.34.1 bitops_1.0-6 RCurl_1.95-4.8 biomaRt_2.30.0
RSQLite_1.0.0 XML_3.98-1.5
--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}}
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel