Re: [Bioc-sig-seq] Parallel version of the Biostrings::read.DNAStringSet and write.XStringSet functions ?

mtmorgan Wed, 03 Mar 2010 15:28:47 -0800

Quoting Sirisha Sunkara <[email protected]>:

Hello,
A newbie question: is there a parallel version available to work onlarge fasta files, for these functions already?


Hi Sirisha --

In general no, these (and other R) functions are not parallelized. Theusual strategy would be to write a script that operates on one file orother 'chunk' of data, and then use one of snow ('easiest'), multicore(best for multiple core on a linux computer), or Rmpi (computationdistributed across clusters) to do a version of 'lapply' (e.g.,mclapply, mpi.parLapply) that is distributed across cores / nodes.

For read/write.*StringSet, the basic limitation is disk i/o, and youmight investigate where your data resides relative to the computerdoing the analysis, e.g., data on a networked file system can havesignificant latency. Also, parallelizing on a single machine (e.g.,multiple cores) means that the resources of that machine are used byseveral processes, so one might expect to quickly run in to memory ori/o throughout limitations.


Martin


Thank You,

--sirisha

sessionInfo()

R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.14.12 IRanges_1.4.11

loaded via a namespace (and not attached):
[1] Biobase_2.6.1

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Parallel version of the Biostrings::read.DNAStringSet and write.XStringSet functions ?

Reply via email to