On Sat, Jun 12, 2010 at 12:17 AM, Patrick Aboyoun <[email protected]>wrote:
> Janet, > Most function in the IRanges package follows the R convention of > considering the elements of names to be loosely linked attributes rather > than rigid keys. For convenience, functions such as $, [, [[ treat a list as > a hash if it has names, but in most circumstances the names are ignored or > copied without use. Even when there are names on elements, there are some > odd corner cases that can cause problems. For example, if I wanted to have > multiple list elements with the same name, then some important operations > give unexpected results: > > > list(a = 1, a = 2)["a"] > $a > [1] 1 > > If the issue is limited to enhance the seqselect function to make it name > aware, it probably makes sense to go ahead with the enhancement. But the > scope of this issue can grow quite large. For example, should names be used > when adding to RleList objects? What should the following produce > > RleList(a = Rle(1)) + RleList(a = Rle(2), a = Rle(3), b = Rle(4)) > > Due to these types of ambiguities, I would rather focus on educating the > user to be mindful that these are position-oriented rather than key-oriented > objects and have them ensure that elements are in alignment. > > Thoughts? > > > Sometimes in IRanges the names have a special semantic -- that of a "space". I guess this is limited to RangesList. Other data structures, like RleList, are often treated as being separated by space or chromosome, though their names have never explicitly been treated as the space. This inconsistency is probably OK, but it needs to be documented. > Patrick > > > > > On 6/11/10 4:06 PM, Janet Young wrote: > >> Hi, >> >> I've been playing around with seqselect on scores stored in a >> SimpleRleList object to get subregions defined in a RangesList object. >> >> I found a couple of things: first an enhancement request - would it be >> possible to allow seqselect to deal with cases where not every space (name) >> in the SimpleRleList has a corresponding space/name in the RangesList >> object? >> >> The second is either bug or else I've misunderstood the way seqselect is >> supposed to work, in a dangerous way - it looks like seqselect doesn't use >> the names of the list items to select scores, it just assumes that in the >> two lists the elements have the same names in the same order. >> >> The code below should explain both issues problem much better than those >> descriptions. >> >> thanks, >> >> Janet >> >> >> >> > library(IRanges) >> >> Attaching package: 'IRanges' >> >> The following object(s) are masked from 'package:base': >> >> cbind, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int, >> rbind, rep.int, table >> >> > >> > ### generate some arbitrary scores >> > track <- RangedData(RangesList(chrA = IRanges(start = c(1, 4, 6), >> width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))) ) >> > trackCoverage <- coverage(track, >> weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) ) >> > >> > ### define subregions >> > exons <- RangesList(chrA = IRanges(start = c(2, 4), width = c(2,2)),chrB >> = IRanges(start = 3, width = 5)) >> > >> > ### seqselect works if all spaces in trackCoverage have an element in >> exons >> > seqselect(trackCoverage,exons ) >> SimpleRleList of length 2 >> $chrA >> 'integer' Rle of length 4 with 2 runs >> Lengths: 2 2 >> Values : 2 7 >> >> $chrB >> 'integer' Rle of length 5 with 2 runs >> Lengths: 1 4 >> Values : 2 1 >> >> > >> > ### define subregions only on one chr >> > exons_chrAonly <- RangesList(chrA = IRanges(start = c(2, 4), width = >> c(2, 2))) >> > ### now seqselect doesn't work if some spaces don't have any elements >> > seqselect(trackCoverage,exons_chrAonly ) >> Error in seqselect(trackCoverage, exons_chrAonly) : >> 'length(start)' must equal 'length(x)' when 'end' and 'width' are NULL >> > >> > >> > ##### also, defining the regions with spaces in a different order seems >> to cause trouble as seqselect doesn't seem to be using the list's names - >> just going by order of elements >> > exons_reorderchrs <- RangesList(chrB = IRanges(start = 3, width = >> 5),chrA = IRanges(start = c(2, 4), width = c(2,2))) >> > seqselect(trackCoverage,exons_reorderchrs ) >> SimpleRleList of length 2 >> $chrA >> 'integer' Rle of length 5 with 3 runs >> Lengths: 1 2 2 >> Values : 2 7 3 >> >> $chrB >> 'integer' Rle of length 4 with 3 runs >> Lengths: 1 1 2 >> Values : 1 2 1 >> >> > >> > identical ( seqselect(trackCoverage,exons ) , >> seqselect(trackCoverage,exons_reorderchrs ) ) >> [1] FALSE >> > >> > sessionInfo() >> R version 2.11.1 (2010-05-31) >> i386-apple-darwin9.8.0 >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.6.6 >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
