Let's generalize beyond space. When we have a RangesList by chromosome, we're really saying that our ranges are sliced up by a chromosome factor. Thus, the names represent this factor. Each name should correspond to a level, so that all are unique, and they should be in the same order of the levels, as returned by split(). As long as drop=FALSE (the default) is passed to split(), everything should work between lists split by a common factor.
So I agree with Patrick that all parallel list operations should be positional. But there are other types of operations. For example, searching, as performed by findOverlaps() and nearest(), will look for the query space in the subject/database spaces and match them up. There is a distinction between the two operations, and it's important for the user to understand that. It just needs to be documented a bit better, I guess. Michael On Sat, Jun 12, 2010 at 12:17 AM, Patrick Aboyoun <[email protected]>wrote: > Janet, > Most function in the IRanges package follows the R convention of > considering the elements of names to be loosely linked attributes rather > than rigid keys. For convenience, functions such as $, [, [[ treat a list as > a hash if it has names, but in most circumstances the names are ignored or > copied without use. Even when there are names on elements, there are some > odd corner cases that can cause problems. For example, if I wanted to have > multiple list elements with the same name, then some important operations > give unexpected results: > > > list(a = 1, a = 2)["a"] > $a > [1] 1 > > If the issue is limited to enhance the seqselect function to make it name > aware, it probably makes sense to go ahead with the enhancement. But the > scope of this issue can grow quite large. For example, should names be used > when adding to RleList objects? What should the following produce > > RleList(a = Rle(1)) + RleList(a = Rle(2), a = Rle(3), b = Rle(4)) > > Due to these types of ambiguities, I would rather focus on educating the > user to be mindful that these are position-oriented rather than key-oriented > objects and have them ensure that elements are in alignment. > > Thoughts? > > > Patrick > > > > > On 6/11/10 4:06 PM, Janet Young wrote: > >> Hi, >> >> I've been playing around with seqselect on scores stored in a >> SimpleRleList object to get subregions defined in a RangesList object. >> >> I found a couple of things: first an enhancement request - would it be >> possible to allow seqselect to deal with cases where not every space (name) >> in the SimpleRleList has a corresponding space/name in the RangesList >> object? >> >> The second is either bug or else I've misunderstood the way seqselect is >> supposed to work, in a dangerous way - it looks like seqselect doesn't use >> the names of the list items to select scores, it just assumes that in the >> two lists the elements have the same names in the same order. >> >> The code below should explain both issues problem much better than those >> descriptions. >> >> thanks, >> >> Janet >> >> >> >> > library(IRanges) >> >> Attaching package: 'IRanges' >> >> The following object(s) are masked from 'package:base': >> >> cbind, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, >> rbind, rep.int, table >> >> > >> > ### generate some arbitrary scores >> > track <- RangedData(RangesList(chrA = IRanges(start = c(1, 4, 6), >> width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))) ) >> > trackCoverage <- coverage(track, >> weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) ) >> > >> > ### define subregions >> > exons <- RangesList(chrA = IRanges(start = c(2, 4), width = c(2,2)),chrB >> = IRanges(start = 3, width = 5)) >> > >> > ### seqselect works if all spaces in trackCoverage have an element in >> exons >> > seqselect(trackCoverage,exons ) >> SimpleRleList of length 2 >> $chrA >> 'integer' Rle of length 4 with 2 runs >> Lengths: 2 2 >> Values : 2 7 >> >> $chrB >> 'integer' Rle of length 5 with 2 runs >> Lengths: 1 4 >> Values : 2 1 >> >> > >> > ### define subregions only on one chr >> > exons_chrAonly <- RangesList(chrA = IRanges(start = c(2, 4), width = >> c(2, 2))) >> > ### now seqselect doesn't work if some spaces don't have any elements >> > seqselect(trackCoverage,exons_chrAonly ) >> Error in seqselect(trackCoverage, exons_chrAonly) : >> 'length(start)' must equal 'length(x)' when 'end' and 'width' are NULL >> > >> > >> > ##### also, defining the regions with spaces in a different order seems >> to cause trouble as seqselect doesn't seem to be using the list's names - >> just going by order of elements >> > exons_reorderchrs <- RangesList(chrB = IRanges(start = 3, width = >> 5),chrA = IRanges(start = c(2, 4), width = c(2,2))) >> > seqselect(trackCoverage,exons_reorderchrs ) >> SimpleRleList of length 2 >> $chrA >> 'integer' Rle of length 5 with 3 runs >> Lengths: 1 2 2 >> Values : 2 7 3 >> >> $chrB >> 'integer' Rle of length 4 with 3 runs >> Lengths: 1 1 2 >> Values : 1 2 1 >> >> > >> > identical ( seqselect(trackCoverage,exons ) , >> seqselect(trackCoverage,exons_reorderchrs ) ) >> [1] FALSE >> > >> > sessionInfo() >> R version 2.11.1 (2010-05-31) >> i386-apple-darwin9.8.0 >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.6.6 >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
