Hi Patrick Patrick Aboyoun wrote: > I've made further changes to the IRanges package to make it easier to > perform the coverage operations across chromosomes/contigs. I've added > RleViewsList (virtual) and SimpleRleViewsList (non-virtual) classes to > IRanges as well as a RleViewsList constructor and viewSums and other > view* summary methods for RleViewList objects. The code below looks much > cleaner now due to the RleViewsList constructor and the viewSums method. > Also, I point out the viewApply function, which serves in the capacity > of the aggregate function that you were experimenting with in your > e-mails. Do these steps fit into your workflow?
Given this, it seems to me now that viewApply and coverage have very similar use cases. Do we need both? I am still quite worried that nobody outside the Hutch will be able to figure out which class to use when. There are simply so many classes in IRanges. Do you have a class hierarchy diagram? A justification for all the RangedData-style classes is, of course, that one usually deals with multiple chromosomes. So, I went through the example of my last mail again, now with two chromosomes. Last time, we had this here, a RangedData with a single, unnamed sequence, and one column, 'score': > v3 <- RangedData( + IRanges( start = c( 1, 4, 6 ), width = c( 3, 2, 4 ) ), + score = c( 2, 7, 3 ) ) > v3 RangedData: 3 ranges by 1 column on 1 sequence colnames(1): score Now, I attempt to make a RangedData with two sequences, 'chrA' and 'chrB': > v3b <- RangedData( + RangesList( + chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)), + chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4)) ), + score = c( 2, 7, 3, 1, 1, 1 ) ) > v3b RangedData: 6 ranges by 0 columns on 2 sequences colnames(0): names(2): chrA chrB As you can see, the 'score' argument was ignored. I've tried a couple of other syntaxes, but could not get it in. What's the trick? To proceed, I add my scores manually: > v3b$score <- c( 2, 7, 3, 1, 1, 1 ) I still feel a bit uneasy about the fact that the split between the sequences is not apparent here, but maybe it is simplest like this. I now switch over to coverage vectors: > v2b <- coverage( v3b, weight="score" ) This has now nicely assembled everything over both chromosomes: > as.vector( v2b$chrA ) [1] 2 2 2 7 7 3 3 3 3 > as.vector( v2b$chrB ) [1] 1 1 2 1 1 1 1 1 1 Let's get some exons, two on chrA and one on chrB: > exons <- RangesList( + chrA = IRanges( start = c(2, 4), width = c(2, 2) ), + chrB = IRanges( start = 3, width = 5 ) ) Now, 'aggregate' fails, with a rather misleading error message: > aggregate( v2b, exons, sum ) Error in solveWindowSEW(length(x), start = ifelse(is.null(start), NA, : Invalid sequence coordinates. Please make sure the supplied 'start', 'end' and 'width' arguments are defining a region that is within the limits of the sequence. So, I'll try with views: > v2exons <- RleViewsList( rleList=v2b, rangesList=exons ) > viewSums( v2exons ) SimpleIntegerList: 2 elements > viewSums( v2exons )[[1]] [1] 4 14 > viewSums( v2exons )[[2]] [1] 6 This worked, but we lost the sequence names: > viewSums( v2exons )[["chrA"]] NULL > viewSums( v2exons )$chrA NULL I suppose that we again rely here on the sequences being in the same order in v2b and exons, i.e., if one has 'chrA' and 'chrB', and the other, say, 'chrB' and 'chrC', this would not get spotted, or would it? Sorry for the length nit-picking mails but this gets all a bit complex... Cheers Simon _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
