Quoting Sirisha Sunkara <[email protected]>:
Hello,
A newbie question: is there a parallel version available to work on
large fasta files, for these functions already?
Hi Sirisha --
In general no, these (and other R) functions are not parallelized. The
usual strategy would be to write a script that operates on one file or
other 'chunk' of data, and then use one of snow ('easiest'), multicore
(best for multiple core on a linux computer), or Rmpi (computation
distributed across clusters) to do a version of 'lapply' (e.g.,
mclapply, mpi.parLapply) that is distributed across cores / nodes.
For read/write.*StringSet, the basic limitation is disk i/o, and you
might investigate where your data resides relative to the computer
doing the analysis, e.g., data on a networked file system can have
significant latency. Also, parallelizing on a single machine (e.g.,
multiple cores) means that the resources of that machine are used by
several processes, so one might expect to quickly run in to memory or
i/o throughout limitations.
Martin
Thank You,
--sirisha
sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.14.12 IRanges_1.4.11
loaded via a namespace (and not attached):
[1] Biobase_2.6.1
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing