Hi,

I have been trying to use Rsamtools’ applyPileups function to compare two files 
position-by-position. In order to test it out, I simply ran:
minBaseQuality <- 20
minMapQuality <- 30
minDepth <- 10
maxDepth <- 1000
testparam <- PileupParam(what="seq",
                         which=GRanges(“chr20", IRanges(1, 1000000)),
                         minBaseQuality=minBaseQuality,
                         minMapQuality=minMapQuality,
                         minDepth=minDepth,
                         maxDepth=maxDepth,
)
fl1 <- "10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam"
fl2 <- "10696X4_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam"
r3 = applyPileups(PileupFiles(c(fl1, fl2)), function(x) x, param=testparam)

My understanding is that this should result in a three-dimensional array with 
“ACTGN Counts” in the first dimension, files in the second and position in the 
third. Positions with overlapping reads in both files should thus be collapsed 
into a single line in the third dimension. However, selecting one of these 
positions shows that they are duplicated:

r3[[1]][["seq"]][ , , r3[[1]][["pos"]] == 135003]
yields:

, , 1

  10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam 
10696X4_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
A                                                   0                           
                        0
C                                                   0                           
                        0
G                                                  10                           
                        0
T                                                   0                           
                        0
N                                                   0                           
                        0

, , 2

  10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam 
10696X4_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
A                                                   0                           
                        0
C                                                   0                           
                        0
G                                                   0                           
                       13
T                                                   0                           
                        0
N                                                   0                           
                        0

Even though the position is the same, it is showing up twice. Each time, one of 
the files shows zeroes. This is not consistent with what happens if the files 
are identical (as in the example from the help docs).

For example,

r3 = applyPileups(PileupFiles(c(fl1, fl1)), function(x) x, param=testparam) 
#file 1 entered twice
 r3[[1]][["seq"]][ , , r3[[1]][["pos"]] == 135003]

yields:

  10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam 
10696X1_140408_SN141_0782_BC4J3NACXX_2_STP1N90A.bam
A                                                   0                           
                        0
C                                                   0                           
                        0
G                                                  10                           
                       10
T                                                   0                           
                        0
N                                                   0                           
                        0

Is this the expected behavior? It seems like each position should only show up 
once in the output. Is there something I am missing?

Thanks,

Jonathon Hill
Postdoc
Yost Lab, University of Utah



        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to