On 10/30/2013 10:07 AM, Michael Lawrence wrote:
On Wed, Oct 30, 2013 at 8:37 AM, Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> wrote: On 10/30/2013 07:55 AM, Michael Lawrence wrote: On Tue, Oct 29, 2013 at 5:55 PM, Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote: Hi Michael, In Bioc < 2.13, subsetting was a mess. In particular, handling of list-like subscripts was rather unpredictable. It would work only if you were lucky enough to try it with one of the few supported types (like IntegerList, LogicalList, or IRangesList), but it didn't work for other very natural types like list or CharacterList. Or it would work for [ but not for [<-, or vice-versa: x <- splitAsList(letters[1:6], c(2, 4, 3, 2, 2, 4)) x[list(1)] # doesn't work in BioC < 2.13! x[list(1)] <- "XX" # works in BioC < 2.13! Or, if both [ and [<- worked, they could behave inconsistently: one would require the list-like subscript to have the same length as 'x' but the other wouldn't. Or one would use the names on the subscript and on 'x' to map the list elements between the two, but the other wouldn't. Hopefully in BioC 2.13, subsetting behaves more consistently (at least that was the intention). For example now the names on the subscript and on 'x' are always used to map the list elements between the two: > x[list(`4`=2:1)] CharacterList of length 1 [["4"]] f b Also now, it's an error if the subscript has names but 'x' has not: > unname(x)[list(`4`=2:1)] Error in subsetListByList(x, i) : cannot subscript an unnamed list-like object by a named list-like object (I should probably change this message for: "cannot subset an unnamed list-like object by a named list-like subscript".) This is to be consistent with subsetting a Vector object by name, which fails if 'x' has no names: > IRanges(1:4, 5)["a"] Error in normalizeSingleBracketSubscrip____t(i, x) : cannot subset by character when names are NULL If the subscript is a list-like object with names, the assumption is that the user intended those names to be mapped against 'x' names. Why make this assumption? As I said, this is how [<- was behaving in Bioc < 2.13, but not [. When reunifying a choice has to be made, and I chose to make [ behave like [<- and not the other way around. For 3 reasons: 1. It makes subsetting by a list-like object more flexible. 2. It feels more consistent with what subsetting a Vector object by name does. 3. It's also consistent with what findOverlaps() and subsetByOverlaps() have been doing for years on named RangesList objects. Three users here have not made it and were surprised by the names on the index having any relevance to extraction. The good news is that now that they've been surprised by this, they won't be surprised by the behavior of subsetByOverlaps() ;-) We cannot totally eliminate user surprises (depends too much on individual backgrounds), but we can minimize them by providing consistent behavior. A long time ago we decided that the extraction should just happen in parallel, and that subsetByOverlaps on RangesList was a special case (it's already so different from basic [ extraction). Apparently, we forgot to take out the [<- stuff. So I would argue that the change should go the other way. Users will not expect the name-based matching. At least that was the consensus 4 years ago.
The consensus amongst who? AFAICT this behavior was not documented and no unit test broke when I modified List-wise extraction to match the names, so it looked more like a grey area to me than a conscious decision. Also none of the 100 or so software packages that depend directly or indirectly on IRanges seemed to be affected by this change. The reason for this is that the most common use case for List-wise extraction is something like this: > cvg <- RleList(chr1=Rle(c(0, 1, 0), c(10, 5, 8)), chr2=Rle(c(1, 0), c(6, 14))) > cvg[cvg >= 1] RleList of length 2 $chr1 numeric-Rle of length 5 with 1 run Lengths: 5 Values : 1 $chr2 numeric-Rle of length 6 with 1 run Lengths: 6 Values : 1 And this use case is not affected by List-wise extraction matching or not the names. Thomas's use case is very unusual: the subscript looks like the result of a split() and, most of the times, I would expect this subscript to be used to subset an object that is also the result of a split() (by a split factor with the same levels). So the object to subset and the subscript would normally both end up with the same names. But in his case, the object to subset has no names, I don't know why. If List-wise extraction matches the names, subsetting still does the right thing even if the split factors have levels not in the same order (or if some levels in the factor used to split the subscript are missing). If it doesn't match the names, the subsetting will make no sense and the user won't even know it. No surprise but wrong result. H.
Others should chime on in this discussion. Should List-wise extraction and replacement match by the names of the List elements? H. If 'x' doesn't have names, I think it should fail rather than silently fall back to position-based mapping. So at least you give a chance to the user to either put names on 'x' (maybe s/he just forgot) or to remove them from the subscript. If we really want to fall back to position-based mapping, at least it should issue a warning, I think. One thing I didn't change from pre-BioC-2.13 behavior is that a list-like subscript (when unnamed) is not recycled along 'x'. It's open to discussion whether this would be a good thing to have or not. Changing this would be pretty disruptive though... Cheers, H. On 10/29/2013 03:51 PM, Michael Lawrence wrote: I think we should just drop the names for the user. The Bioc <2.13 behavior seems reasonable to me. Please elaborate on the subtle issues. Most users would not expect the *names* on the index to have any effect on the extraction, in accordance with the behavior of ordinary vectors. The only difference with Lists is that there is a partitioning, which seems unrelated to naming. Michael On Tue, Oct 29, 2013 at 3:40 PM, Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>>> wrote: Hi Thomas, For the same reasons that you cannot subset by names a Vector object with no names: > IRanges(1:4, width=10)[letters[1:4]] Error in normalizeSingleBracketSubscrip______t(i, x) : cannot subset by character when names are NULL you cannot subset an unnamed List object using a named list-like subscript. So in your case, just remove the names on 'keep_ranges' (which are probably not desired anyway) before using it as a subscript: > keep_ranges CompressedIRangesList of length 18 $`1` IRanges of length 1 start end width [1] 20 108 89 $`2` IRanges of length 1 start end width [1] 43 131 89 $`3` IRanges of length 1 start end width [1] 21 105 85 ... <15 more elements> > return_rles[ unname(keep_ranges) ] RleList of length 18 [[1]] logical-Rle of length 89 with 1 run Lengths: 89 Values : TRUE [[2]] logical-Rle of length 89 with 1 run Lengths: 89 Values : TRUE [[3]] logical-Rle of length 85 with 1 run Lengths: 85 Values : TRUE [[4]] logical-Rle of length 85 with 1 run Lengths: 85 Values : TRUE [[5]] logical-Rle of length 102 with 1 run Lengths: 102 Values : TRUE ... <13 more elements> Prior to BioC 2.13, it was possible to subset an unnamed List object by a named list-like subscript, and in that case, the names on the subscript were ignored and the subscript was treated as parallel to the object to subset. However this behavior was somehow dangerous (could lead to subtle issues) and didn't follow the spirit of what subsetting an unnamed Vector by name does. So it's not supported anymore. Sorry for the inconvenience, H. On 10/29/2013 03:05 PM, Thomas Sandmann wrote: Hi Herve, I have updated to IRanges 1.20.4 now, but unfortunately, I still encounter an error when I try to subset a CompressedRleList or SimpleRleList with a CompressedIRangesList or SimpleIRangesList. Would you mind having a look at where I am going wrong ? (My two example objects are available in the rdata object at the url shown below). con=url("http://dl.__dropboxus____ercontent.com/u/__126180/____example.rdata <http://dropboxus__ercontent.com/u/__126180/__example.rdata> <http://dropboxusercontent.__com/u/__126180/example.rdata <http://dropboxusercontent.com/u/__126180/example.rdata>> <http://dl.dropboxusercontent.____com/u/126180/example.rdata <http://dl.dropboxusercontent.__com/u/126180/example.rdata <http://dl.dropboxusercontent.com/u/126180/example.rdata>>>") load( con ) return_rles[ keep_ranges ] Error in subsetListByList(x, i) (from List-class.R#205) : cannot subscript an unnamed list-like object by a named list-like object R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] trimPrimers_1.3.0 Rsamtools_1.14.1 Biostrings_2.30.0 [4] GenomicRanges_1.14.2 XVector_0.2.0 IRanges_1.20.4 [7] BiocGenerics_0.8.0 Defaults_1.1-1 BiocInstaller_1.12.0 [10] roxygen2_2.2.2 digest_0.6.3 devtools_1.3 loaded via a namespace (and not attached): [1] bitops_1.0-6 brew_1.0-6 compiler_3.0.2 evaluate_0.5.1 httr_0.2 [6] memoise_0.1 RCurl_1.95-4.1 stats4_3.0.2 stringr_0.6.2 tools_3.0.2 [11] whisker_0.3-2 zlibbioc_1.8.0 -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319> _____________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> <mailto:Bioc-devel@r-project.__org <mailto:Bioc-devel@r-project.org>> <mailto:Bioc-devel@r-project. <mailto:Bioc-devel@r-project.>____org <mailto:Bioc-devel@r-project.__org <mailto:Bioc-devel@r-project.org>>> mailing list https://stat.ethz.ch/mailman/______listinfo/bioc-devel <https://stat.ethz.ch/mailman/____listinfo/bioc-devel> <https://stat.ethz.ch/mailman/____listinfo/bioc-devel <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>> <https://stat.ethz.ch/mailman/____listinfo/bioc-devel <https://stat.ethz.ch/mailman/__listinfo/bioc-devel> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> <tel:%28206%29%20667-1319> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
-- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel