Dear bioc-sig-sequencing, I would like to determine a cutoff/threshold for a chipseq experiment for defining a FDR (BasicChipSeq.pdf, A ChIP-Seq Data Analysis, page 6 & 7, EX 2, http://www.bioconductor.org/workshops/2009/SeattleNov09/ChIP-seq/BasicChipSeq.pdf).
After reading in the two files (ctcf, gfp), have AignedRead objects. Before running code on page 6 & 7 for ctcf and gfp data (to find distribution of depths compared to the null distributions), would like to account for (equalize) any difference between the number of reads between ctcf, gfp data sets. Is there a recommended way to do this? For example, perhaps 1. One could use the R function 'sample' somehow on the AlignedRead object (ctcf or gfp) with more reads to produce a subset of reads equal to the number in the smaller file? Repeat say 3 times to control for sampling variation when determining the cutoff described above? 2. Or perhaps sort of similar to slide 25 in workshop (CoverageEDA.pdf, http://bioconductor.org/packages/courses/seattle-01-2009/day3/CoverageEDA.pdf), find/create an R function that could multiply an Rle object, here ctcf or gfp (the depth value for each nucleotide) by the fraction representing the relationship between the number of reads in the two AlignedRead objects. This followed by applying 'round' function as done in slide 25 to give integer values for the depth values in the Rle object? (I note the '2009' in this URL should be '2010'?) Can someone comment? Thanks, [email protected] P. Terry [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
