Hi Ivan,
  You are skipping over one part of the pipeline, that is using the ShortRead
package to read in your data and perform some sort of QA.  The output will be
the aligned reads. But you should take the diagnostics seriously, we find lots
of problems that need to be caught early so that the downstream analyses are
reasonable.

  As for how does one justify discarding duplicate reads, why not ask it the
other way around? How does one justify keeping them?  And in either case, one
thing to do is to try to decide if those duplicate reads represent biological
replicates (ie the same piece of DNA was selected twice), or if they are more
likely to represent PCR artifacts.  If the former, then I would keep them, if
the latter, then I would discard them.  For the example given, it is the latter,

 best wishes
   Robert


[email protected] wrote:
> Hello,
> 
> In preparation to analyse my own ChIP-seq data, I am trying to follow the 
> steps described in this sample workflow:
> 
> http://www.bioconductor.org/workshops/2008/SeattleNov08/ChIP-seq/workflow.pdf
> 
> The document starts by loading data that has been "reduced to a set of 
> alignment start positions (including orientation)".
> 
> Can somebody elaborate on that a little bit or, ideally, show it with one 
> example?
> 
> Also, as part of the reduction, the procedure "removed all duplicate reads 
> and applied a quality score cutoff". The score cutoff is fine but how is 
> removing duplicates justified?
> 
> Thank you,
> 
> Ivan
> 
> 
> 
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
[email protected]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to