On 12/12/2016 06:00 AM, Dario Strbenac wrote:
Good day,

I found that readGAlignmentPairs fails when used inside an mclapply loop but 
not an sapply loop. I haven't had such problems with other functions when using 
mclapply.

class(mappedToGenomeFiles)
[1] "character"
length(mappedToGenomeFiles)
[1] 13

mappedReadsGenome <- sapply(mappedToGenomeFiles, function(bamFile)
  {
      readGAlignmentPairs(bamFile, strandMode = 2)
  })
# No error. Each item is of GAlignmentPairs class.

But, with mclapply:

mappedReadsGenome <- mclapply(mappedToGenomeFiles, function(bamFile)
  {
      readGAlignmentPairs(bamFile, strandMode = 2)
  }, mc.cores = 7)
Warning message:
In mclapply(mappedToGenomeFiles, function(bamFile) { :
  scheduled cores 6, 5, 3, 1, 4, 2 encountered errors in user code, all values 
of the jobs will be affected
mappedReadsGenome
[[1]]
[1] "fatal error in wrapper code"
attr(,"class")
[1] "try-error"
[[2]]
[1] "fatal error in wrapper code"
attr(,"class")
[1] "try-error"

if the return value is large, and R tries to serialize them, then it may be that the size of the serialized vector is too large to be represented in R -- you could try

  length(serialize(readGAlignementPairs(bamFile, strandMode=2))))

to test whether this causes the error.

With parallel evaluation you generally want to minimize the amount of data communicated (in both directions) between manager and worker. And since workers are contending for memory on the same machine, you generally want to adopt strategies like

    bf = BamFile(yieldSize=1000000)
    GenomicFiles::reduceByYield(bf, ...)

that iterate through the large object in moderate-sized chunks.

Martin

           .
           .
           .
[[7]]
GAlignmentPairs object with 41860576 pairs, strandMode=2, and 0 metadata 
columns:
             seqnames strand   :                 ranges  --                 
ranges
                <Rle>  <Rle>   :              <IRanges>  --              
<IRanges>
         [1]    chr14      +   :   [19010525, 19010623]  --   [19010414, 
19010513]
         [2]    chr14      +   :   [19010543, 19010612]  --   [19010505, 
19010604]
         [3]    chr14      +   :   [19010608, 19010707]  --   [19010577, 
19010676]
         [4]    chr14      +   :   [19011187, 19011286]  --   [19011142, 
19011241]
         [5]    chr14      +   :   [19011318, 19011415]  --   [19011187, 
19011286]
         ...      ...    ... ...                    ... ...                    
...
  [41860572]     chr4      +   : [190972787, 190972886]  -- [190972685, 
190972784]
  [41860573]     chr4      -   : [190974302, 190974385]  -- [190974302, 
190974385]
  [41860574]     chr4      -   : [190978480, 190978579]  -- [190978542, 
190978641]
  [41860575]     chr4      -   : [190982116, 190982215]  -- [190982125, 
190982224]
  [41860576]     chr4      +   : [191031678, 191031776]  -- [191031630, 
191031729]
  -------
  seqinfo: 25 sequences from an unspecified genome
           .
           .
           .
[[13]]
[1] "fatal error in wrapper code"
attr(,"class")
[1] "try-error"

Interestingly, reading in from one of the thirteen file paths worked.

In contrast, a simple test case of the same length works:

X=1:13
mclapply(X, function(x) x + 1, mc.cores = 7) # Prints 2:14.

The BAM file import also works with blapply and BPPARAM = 
MulticoreParam(workers = 7)

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        
LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           
LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  
methods   base

other attached packages:
 [1] GenomicAlignments_1.10.0   SummarizedExperiment_1.4.0 
GenomicFeatures_1.26.0     AnnotationDbi_1.36.0       Biobase_2.34.0
 [6] Rsamtools_1.26.1           Biostrings_2.42.0          XVector_0.14.0       
      GenomicRanges_1.26.1       GenomeInfoDb_1.10.1
[11] IRanges_2.8.1              S4Vectors_0.12.0           BiocGenerics_0.20.0

loaded via a namespace (and not attached):
 [1] zlibbioc_1.20.0    BiocParallel_1.8.1 lattice_0.20-34    tools_3.3.2       
 grid_3.3.2         DBI_0.5-1          Matrix_1.2-7.1
 [8] rtracklayer_1.34.1 bitops_1.0-6       RCurl_1.95-4.8     biomaRt_2.30.0    
 RSQLite_1.0.0      XML_3.98-1.5

--------------------------------------
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel



This email message may contain legally privileged and/or...{{dropped:2}}

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to