Quoting Sirisha Sunkara <[email protected]>:

Hello,

A newbie question: is there a parallel version available to work on large fasta files, for these functions already?

Hi Sirisha --

In general no, these (and other R) functions are not parallelized. The usual strategy would be to write a script that operates on one file or other 'chunk' of data, and then use one of snow ('easiest'), multicore (best for multiple core on a linux computer), or Rmpi (computation distributed across clusters) to do a version of 'lapply' (e.g., mclapply, mpi.parLapply) that is distributed across cores / nodes.

For read/write.*StringSet, the basic limitation is disk i/o, and you might investigate where your data resides relative to the computer doing the analysis, e.g., data on a networked file system can have significant latency. Also, parallelizing on a single machine (e.g., multiple cores) means that the resources of that machine are used by several processes, so one might expect to quickly run in to memory or i/o throughout limitations.

Martin


Thank You,

--sirisha

sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.14.12 IRanges_1.4.11

loaded via a namespace (and not attached):
[1] Biobase_2.6.1

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to