Thank you for the suggestions. Ultimately, I would like to compare
gene(isoform) expression between two groups of 10 animals with one lane per
animal. I am using the public server to practice with some small data sets
right now, but will be getting the real data very soon and plan on using an
Amazon Cloud account to actually do the analysis. I can see now that this
approach is going to be met with some difficulty with the current state of the
data volume restrictions and limited functionality of Galaxy for
Cuffcompare/diff. Can you comment any further on the timeline of the
availability of the full functionality of these programs? You seemed to
suggest they will be available on the public server before they are available
on the Cloud?
Also, for the time being, would you mind clarifying for me what you mean by
repeatedly merging Cufflinks outputs? I imagine using Tophat to map the reads
and find splice junctions and assembling transcripts using Cufflinks for each
of the 20 animals. Are you talking about running the Cufflinks GTF output
through Cuffcompare, which allows two GTF files in Galaxy, and merging that
output(the union file) with the third Cufflinks file and so on for all ten
animals? Then do the same thing for the other group of ten animals, and then
comparing the two for a rough idea of the differences? I guess I'm wondering
how far I will be able to get with the analysis as things stand on the Cloud or
the public server.... I also need to come up with a strategy to work around
the 1000Gb space limit, as with 20 samples of 25 million reads and repeatedly
generating files I think it will get used up quickly....
As far as changing the bowtie options through Tophat, I was just going to play
around with the bowtie mapping settings to get an idea of which strategy is
optimal and use those settings for Tophat, but this is probably unnecessary in
the grand scheme of the analysis.
I really appreciate your help - Thanks,
From: Jeremy Goecks on behalf of Jeremy Goecks
Sent: Tue 1/18/2011 9:29 PM
To: Martin, David A.
Cc: eaf...@emory.edu; galaxy-u...@bx.psu.edu
Subject: Re: [galaxy-user] Galaxy for gene expression comparison
> I am comparing RNA expression in two groups of rats, a drug treated group
> against a control group. There are 10 biological replicates in each group. I
> am unsure of how to flow this analysis through Galaxy using Tophat followed
> by Cufflinks/compare/diff. Should the files for each group be merged at any
> point? I would think they should be kept separate in order to properly
> account for the spread across animals. I am just a little unsure of how to
> group the files on galaxy, and where to differentiate biological and
> technical replicates.
Yes, you're right -- merging along the way will prevent you from quantifying
within-group variation; consequently, quantifying across-group variation will
be very challenging as well. Here's the right thing to do:
(1) map each replicate using Tophat and assemble transcripts using Cufflinks;
(2) for all Cufflinks' outputs (assembled transcripts), build a set of
comprehensive transcripts using Cuffcompare;
(3) for Cuffdiff, group the replicates from each group and let Cuffdiff
determine and quantitate within group and across group variation.
However, Galaxy's tools currently don't support replicates, so you can't yet
perform this analysis. We're working to enhance them, however, and we should
have this functionality available on our main server in the next couple weeks.
Enis can comment about when this functionality will be available on the cloud.
(To be clear, you can perform step 1 using Galaxy now. You can also perform
step 2, but you'll have to do so by repeatedly merging Cufflinks' outputs. You
cannot perform step 3 right now with Galaxy.)
> On a different note, is there a way to control the bowtie mapping parameters
> more closely when using tophat?
There's limited control that you can exert over the bowtie commands within
Tophat. Looking at the Tophat manual:
it looks like max-multihits (Maximum number of alignments) is the only Bowtie
parameter you can directly control. There are, however, many Tophat parameters
that enable you to control splice junction mapping directly; set Tophat's
settings to 'Full parameter list' to see all the parameters you can control.
What exactly are you looking to do?
galaxy-user mailing list