[Bioc-sig-seq] potential bug either in strandFilter (ShortRead) or estimate.mean.fraglen (chipseq)

Johannes Rainer Fri, 05 Mar 2010 05:22:43 -0800

dear all,

I already reported that I have a problem using the estimate.mean.fraglen
function from the chipseq package using the method="correlation", finally I
think I found the reason for the problem:


short example: I filter my data set to unique reads on chromosomes 1-22, X
and Y and filter also for reads on + and - strand.

> chromFilter <- chromosomeFilter( "(^[1-9]|^X|^Y)" )
> strFilter <- strandFilter( c("+", "-") )
> theFilters <- compose( chromFilter, strFilter )
> AlignedReads <- AlignedReads[[1]]
> AlignedReads <- AlignedReads[ theFilters( AlignedReads ) ]
> uFilter <- uniqueFilter( withSread=FALSE )
> AlignedReads <- AlignedReads[ uFilter( AlignedReads ) ]
> AlignedReads
class: AlignedRead
length: 17860091 reads; width: 32 36 cycles
chromosome: 7 8 ... 1 15
position: 131934390 3275112 ... 64185954 45913614
strand: + + ... - -
alignQuality: NumericQuality
alignData varLabels: similar mismatch
> unique( strand( AlignedReads ))
[1] + -
Levels: - + *

this was actually the first surprise with the levels of strand being - + *
although the object does only contain reads on + and - strand. this was then
also the cause for the error in the  estimate.mean.fraglen function:

> Test <- estimate.mean.fraglen( AlignedReads, method="correlation" )
Error in if (from > from0 || to < to0) stop("[from, to] smaller than support
not implemented yet (but easy to add)") :
  missing value where TRUE/FALSE needed
Error in unlist(lapply(splitData, function(y) { :
  error in evaluating the argument 'x' in selecting a method for function
'unlist'

by looking at the source code of the chipseq package I finally realised that
the 3 levels of the strand are the problem, since the estimate.mean.fraglen
function defined for AlignedRead classes splits the reads based on the
strand (and "split" with "drop=FALSE" (the default) will split the data set
into 3 columns, one for each level):

setMethod("estimate.mean.fraglen", "AlignedRead",
          function(x, method = c("SISSR", "coverage", "correlation"), ...) {
              splitData <-
                split(data.frame(strand = strand(x),
                                 start =
                                 ifelse(strand(x) == "-",
                                        position(x) + width(x) - 1L,
                                        position(x))),
                      chromosome(x))
              unlist(lapply(splitData,
                            function(y) {
                                estimate.mean.fraglen(split(y[["start"]],
 ## splits reads
                                                            y[["strand"]]),
                                                      method = method, ...)
                            }))
           })



I suggest the following solutions:
1) either reduce the levels of strand( AlignedReads ) to only + and - after
applying a strandFilter, or
2) use split with the option drop=TRUE in the estinmate.mean.fraglen
function:

...
                            function(y) {
                                estimate.mean.fraglen(split(y[["start"]],
                                                            y[["strand"]],
                                                            drop=TRUE ),
                                                      method = method, ...)
                            }))
...



best regards, jo


-- 
Johannes Rainer, PhD
Bioinformatics Group,
Division Molecular Pathophysiology,
Biocenter, Medical University Innsbruck,
Fritz-Pregl-Str 3/IV, 6020 Innsbruck, Austria
and
Tyrolean Cancer Research Institute
Innrain 66, 6020 Innsbruck, Austria

Tel.:     +43 512 570485 13
Email:  [email protected]
           [email protected]
URL:   http://bioinfo.i-med.ac.at

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

[Bioc-sig-seq] potential bug either in strandFilter (ShortRead) or estimate.mean.fraglen (chipseq)

Reply via email to