I'd like to start an open discussion on the topic of parallelization for
NGS data. I noticed that Galaxy recently came out with a cloud-based
interface using Amazon EC3. I've been trying to learn more about how these
NGS analysis algorithms (for alignment, assemly, etc.) are actually
implemented in a parallel fashion, but I have had trouble finding specific
documentation and resources describing how it works and how it is
implemented. Any direction/resources that people can provide would be much

Also, I have seen some papers describing parallelization of various
specific algorithms, especially recently (such as PASQUAL from Georgia
Tech), but they all seem to be operating on relatively "small" networks of
distributed computing resources. Does anyone have any idea about how far
the parallelization and speeding up of these analyses can be pushed? How
difficult would it to be to implement something that runs on a distributed
network of say 100,000 computers, or even more... say a million? Is there a
bottleneck somewhere that would prevent that from being feasible for NGS
analysis? Or would that make the analyses amazingly fast compared to what's
available now? I'm thinking of a system like what the SETI project has set
up for their distributed computing user base and wondering what the limits
are and how one could implement such a system if the user base is already
in place.
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:


To manage your subscriptions to this and other Galaxy lists,
please use the interface at:


Reply via email to