Simon,
I have checked in changes to IRanges (now version 1.3.36), BioC 2.5 (devel), that improve the RangedData constructor to accept a RangesList object for its ranges object as well as added passed the element names when performing view* operations. I'm going to consider the aggregate use case some more and get back to you shortly on that one. Here is a revised example:

> suppressMessages(library(IRanges))
> track <- RangedData(
+     RangesList(
+        chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
+        chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4)) ),
+     score = c( 2, 7, 3, 1, 1, 1 ) )
> track
RangedData: 6 ranges by 1 column on 2 sequences
colnames(1): score
names(2): chrA chrB
> trackCoverage <- coverage(track, weight="score" )
> trackCoverage
SimpleRleList: 2 elements
names(2): chrA chrB
> exons <- RangesList(
+   chrA = IRanges( start = c(2, 4), width = c(2, 2) ),
+   chrB = IRanges( start = 3, width = 5 ) )
> exons
SimpleRangesList: 2 elements
names(2): chrA chrB
> exonView <- RleViewsList( rleList=trackCoverage, rangesList=exons )
> exonView
SimpleRleViewsList: 2 elements
names(2): chrA chrB
> viewSums(exonView)
SimpleIntegerList: 2 elements
names(2): chrA chrB
> viewSums(exonView)[["chrA"]]
[1]  4 14
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-06-28 r48863)
i386-apple-darwin9.7.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] IRanges_1.3.36



Patrick Aboyoun wrote:
Simon,
Answering your issues in turn:

Issue 1) Given this, it seems to me now that viewApply and coverage have very similar use cases. Do we need both?

Reponse 1) Yes. The viewApply function is a general purpose tool for implementing a function over a set of Views. The coverage function performs a (weighted) depth count of interval/ranges over a chosen domain. The viewSums, viewMaxs, etc. are specializations of the viewApply function that are built for performance reasons, much like the colSums, etc. functions are for matrices.


Issue 2) I am still quite worried that nobody outside the Hutch will be able to figure out which class to use when. There are simply so many classes in IRanges. Do you have a class hierarchy diagram?

Response 2) As a first step Michael and I have fleshed out the IRanges vignette

http://bioconductor.org/packages/2.5/bioc/vignettes/IRanges/inst/doc/IRangesOverview.pdf

to explain more about the IRanges make up. The next step in the documentation is to map the entities in biological sequence analysis to IRanges classes. The chipseq package used an amorphous GenomeData class for this purpose, but that is just a glorified list that in one context can store coverage information and in another context can store exon boundaries. I have been trying to revise the chipseq package to be more specific on what data type should represent what sequence analysis entity. So for example, an RleList object is a natural representation of mapped read coverage information across chromosomes/contigs. If you want to associate exon boundaries to coverage data, then RleViewsList objects serve as a natural representation. Once we list out all of the sequence analysis entities, we can create a two column table mapping the entity to the Bioconductor class. We can then create something akin to a UML sequence diagram to show how you get from one object type to another in a typical sequence analysis workflow.


Issue 3) The RangedData constructor isn't intuitive. The following command

v3b <- RangedData(
    RangesList(
    chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
   chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4)) ),
   score = c( 2, 7, 3, 1, 1, 1 ) )

doesn't work as expected.

Response 3) The constructor wasn't general enough. I will fix this shortly.


Issue 4) The aggregate method used for *List objects aren't vectorized over list elements and the error message isn't useful.

Reponse 4) I need to think about this one for a bit because one could feasibly wish to aggregate across list elements and not within them. Perhaps I could check the nature of the by argument and if it is a "list", then aggregate within elements. Otherwise, aggregate between elements as it currently does.


Issue 5) The view* functions don't keep element names from original object.

Reponse 5) This was an oversight on my part and I will correct this shortly.


I'll send out an e-mail to the group once I check in these changes.



Patrick




Simon Anders wrote:
Hi Patrick

Patrick Aboyoun wrote:
I've made further changes to the IRanges package to make it easier to
perform the coverage operations across chromosomes/contigs. I've added
RleViewsList (virtual) and SimpleRleViewsList (non-virtual) classes to
IRanges as well as a RleViewsList constructor and viewSums and other
view* summary methods for RleViewList objects. The code below looks much cleaner now due to the RleViewsList constructor and the viewSums method.
Also, I point out the viewApply function, which serves in the capacity
of the aggregate function that you were experimenting with in your
e-mails. Do these steps fit into your workflow?

Given this, it seems to me now that viewApply and coverage have very
similar use cases. Do we need both?

I am still quite worried that nobody outside the Hutch will be able to
figure out which class to use when. There are simply so many classes in
IRanges. Do you have a class hierarchy diagram?

A justification for all the RangedData-style classes is, of course, that
one usually deals with multiple chromosomes. So, I went through the
example of my last mail again, now with two chromosomes.

Last time, we had this here, a RangedData with a single, unnamed
sequence, and one column, 'score':

v3 <- RangedData(
+    IRanges( start = c( 1, 4, 6 ), width = c( 3, 2, 4 ) ),
+    score = c( 2, 7, 3 ) )
v3
RangedData: 3 ranges by 1 column on 1 sequence
colnames(1): score

Now, I attempt to make a RangedData with two sequences, 'chrA' and 'chrB':

v3b <- RangedData(
+    RangesList(
+       chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),
+       chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4)) ),
+    score = c( 2, 7, 3, 1, 1, 1 ) )
v3b
RangedData: 6 ranges by 0 columns on 2 sequences
colnames(0):
names(2): chrA chrB

As you can see, the 'score' argument was ignored. I've tried a couple of
other syntaxes, but could not get it in. What's the trick?

To proceed, I add my scores manually:

v3b$score <- c( 2, 7, 3, 1, 1, 1 )

I still feel a bit uneasy about the fact that the split between the
sequences is not apparent here, but maybe it is simplest like this.

I now switch over to coverage vectors:

v2b <- coverage( v3b, weight="score" )

This has now nicely assembled everything over both chromosomes:

as.vector( v2b$chrA )
[1] 2 2 2 7 7 3 3 3 3
as.vector( v2b$chrB )
[1] 1 1 2 1 1 1 1 1 1

Let's get some exons, two on chrA and one on chrB:

exons <- RangesList(
+    chrA = IRanges( start = c(2, 4), width = c(2, 2) ),
+    chrB = IRanges( start = 3, width = 5 ) )

Now, 'aggregate' fails, with a rather misleading error message:

aggregate( v2b, exons, sum )
Error in solveWindowSEW(length(x), start = ifelse(is.null(start), NA,  :
  Invalid sequence coordinates.
  Please make sure the supplied 'start', 'end' and 'width' arguments
  are defining a region that is within the limits of the sequence.

So, I'll try with views:

v2exons <- RleViewsList( rleList=v2b, rangesList=exons )

viewSums( v2exons )
SimpleIntegerList: 2 elements

viewSums( v2exons )[[1]]
[1]  4 14
viewSums( v2exons )[[2]]
[1] 6

This worked, but we lost the sequence names:

viewSums( v2exons )[["chrA"]]
NULL
viewSums( v2exons )$chrA
NULL

I suppose that we again rely here on the sequences being in the same
order in v2b and exons, i.e., if one has 'chrA' and 'chrB', and the
other, say, 'chrB' and 'chrC', this would not get spotted, or would it?


Sorry for the length nit-picking mails but this gets all a bit complex...

Cheers
  Simon


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to