Re: [galaxy-user] Identification of replicate outlier

Dave Corney Fri, 09 Nov 2012 09:22:18 -0800

Hi Ross,

Thanks for the suggestions. I'm aware that this is not really a
Galaxy-specific question, and I've been browsing through SeqAnswers and
found a couple of suggestions using edgeR or DESeq, but nothing for Tuxedo
suite. However, I have no experience with either of these tools, so I was
wondering how others have approached this problem if their workflow is
based on Cufflinks.


In the meantime, I'll go through your suggestions and see where I get.

Thanks,
Dave


On Thu, Nov 8, 2012 at 7:21 PM, Ross <[email protected]> wrote:

> Hi Dave,
> This is an interesting and non-trivial question that extends well
> beyond Galaxy - and there's no simple solution AFAIK
> Defining an 'outlier' tends to boil down to subjective judgement in
> most real cases I've seen.
> EG: see
> http://comments.gmane.org/gmane.science.biology.informatics.conductor/40927
>
> My 2c worth:
> a) confirm that all of your sample library sizes and quality score
> distributions are comparable with the FastQC tool. A sample with
> relatively low library size may indicate an upstream technical failure
> with (eg) RNA extraction or a flowcell lane.
> b) check that the number of unique alignments to the reference are
> similar (eg picard alignment summary metrics or even the samtools
> flagstat tool)
> c) if you can create an appropriate input matrix (read counts by exon
> or other contig for each sample eg), the Principal Component Analysis
> tool might be helpful (library size normalization is one devil that
> lies in the detail and it's not quite the same as MDS - see below)
> d) If you're an R hacker, you might find
>
> http://gettinggeneticsdone.blogspot.com.au/2012/09/deseq-vs-edger-comparison.html
> useful - it shows how to get MDS plots which are probably the most
> reliable way to identify samples that don't cluster well with the
> other members of their tribe
>
>
>
> On Fri, Nov 9, 2012 at 10:22 AM, Dave Corney <[email protected]>
> wrote:
> > Hello list,
> >
> > I've been analyzing an experiment with two groups each with three
> > replicates. My workflow was TopHat (paired end) -> Cufflinks -> CuffDiff.
> > Unfortunately, there are not many significant differences identified by
> > CuffDiff.
> >
> > I am wondering whether one of my replicates might be an outlier. Does
> > anybody have a suggestion on how to search for an outlier? The quality
> > statistics of the unprocessed data looked equally good for all samples,
> so I
> > don't think that this is a problem.
> >
> > Thanks,
> > Dave
> >
> >
> > ___________________________________________________________
> > The Galaxy User list should be used for the discussion of
> > Galaxy analysis and other features on the public server
> > at usegalaxy.org.  Please keep all replies on the list by
> > using "reply all" in your mail client.  For discussion of
> > local Galaxy instances and the Galaxy source code, please
> > use the Galaxy Development list:
> >
> >   http://lists.bx.psu.edu/listinfo/galaxy-dev
> >
> > To manage your subscriptions to this and other Galaxy lists,
> > please use the interface at:
> >
> >   http://lists.bx.psu.edu/
>
>
>
> --
> Ross Lazarus MBBS MPH;
> Head, Medical Bioinformatics, BakerIDI; Tel: +61 385321444
> http://scholar.google.com/citations?hl=en&user=UCUuEM4AAAAJ
>

___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Identification of replicate outlier

Reply via email to