I think that there is a valid situation to use a filter to exclude duplicate 
reads when we pool data from different runs and lanes.

We often use alicuots of the exact same PCR pre-amplified biological sample in 
two lanes. Then we do a preliminary analysis, and we decide whether or not we 
need to run two more lanes in the next Solexa run. That depends on how many 
reads we got and whether we have reached our target p-value. As a result, we 
may end up with multiple lanes and runs of the exact same sample.

Actually, I think that the reason readAligned has a filter to keep unique reads 
is the same reason stated above. We need a big non-redundant pool of reads. It 
would be nice to have that functionality in combineLaneReads too.

Thank you,

Ivan





----- Original Message ----
From: Deepayan Sarkar <[email protected]>
To: [email protected]
Cc: [email protected]
Sent: Thursday, 23 April, 2009 18:33:44
Subject: Re: [Bioc-sig-seq] Input from multiple Solexa runs

On Thu, Apr 23, 2009 at 3:22 PM,  <[email protected]> wrote:
>
> Hi Deepayan,
>
> When I do
>
> control1 <- combineLaneReads(c(expt1_analysis1[c("1", "2")],
> expt1_analysis2[c("3", "4")]))
>
> is there a way to filter reads so that I only get one read per genomic 
> position?

combineLaneReads is a very simple function:

combineLaneReads <- function(laneList, chromList = names(laneList[[1]])) {
    names(chromList) = chromList ##to get the return value named
    GenomeData(lapply(chromList,
                      function(chr) {
                          list("+" = unlist(lapply(laneList,
function(x) x[[chr]][["+"]]), use.names = FALSE),
                               "-" = unlist(lapply(laneList,
function(x) x[[chr]][["-"]]), use.names = FALSE))
                      }))
}

and you can just wrap a unique() around the unlist() to make the start
positions unique. But why would you want that? Within a lane,
duplicates are likely to be PCR artifacts, but for data from different
lanes, aren't duplicates more likely to be real? We could easily add
an argument to support this if you have a valid use-case.

-Deepayan





_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to