I think that there is a valid situation to use a filter to exclude duplicate reads when we pool data from different runs and lanes.
We often use alicuots of the exact same PCR pre-amplified biological sample in two lanes. Then we do a preliminary analysis, and we decide whether or not we need to run two more lanes in the next Solexa run. That depends on how many reads we got and whether we have reached our target p-value. As a result, we may end up with multiple lanes and runs of the exact same sample. Actually, I think that the reason readAligned has a filter to keep unique reads is the same reason stated above. We need a big non-redundant pool of reads. It would be nice to have that functionality in combineLaneReads too. Thank you, Ivan ----- Original Message ---- From: Deepayan Sarkar <[email protected]> To: [email protected] Cc: [email protected] Sent: Thursday, 23 April, 2009 18:33:44 Subject: Re: [Bioc-sig-seq] Input from multiple Solexa runs On Thu, Apr 23, 2009 at 3:22 PM, <[email protected]> wrote: > > Hi Deepayan, > > When I do > > control1 <- combineLaneReads(c(expt1_analysis1[c("1", "2")], > expt1_analysis2[c("3", "4")])) > > is there a way to filter reads so that I only get one read per genomic > position? combineLaneReads is a very simple function: combineLaneReads <- function(laneList, chromList = names(laneList[[1]])) { names(chromList) = chromList ##to get the return value named GenomeData(lapply(chromList, function(chr) { list("+" = unlist(lapply(laneList, function(x) x[[chr]][["+"]]), use.names = FALSE), "-" = unlist(lapply(laneList, function(x) x[[chr]][["-"]]), use.names = FALSE)) })) } and you can just wrap a unique() around the unlist() to make the start positions unique. But why would you want that? Within a lane, duplicates are likely to be PCR artifacts, but for data from different lanes, aren't duplicates more likely to be real? We could easily add an argument to support this if you have a valid use-case. -Deepayan _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
