On Wed, Oct 30, 2013 at 8:37 AM, Hervé Pagès <hpa...@fhcrc.org> wrote:

>
> On 10/30/2013 07:55 AM, Michael Lawrence wrote:
>
>>
>>
>>
>> On Tue, Oct 29, 2013 at 5:55 PM, Hervé Pagès <hpa...@fhcrc.org
>> <mailto:hpa...@fhcrc.org>> wrote:
>>
>>     Hi Michael,
>>
>>     In Bioc < 2.13, subsetting was a mess. In particular, handling of
>>     list-like subscripts was rather unpredictable. It would work only
>>     if you were lucky enough to try it with one of the few supported
>>     types (like IntegerList, LogicalList, or IRangesList), but it didn't
>>     work for other very natural types like list or CharacterList.
>>     Or it would work for [ but not for [<-, or vice-versa:
>>
>>        x <- splitAsList(letters[1:6], c(2, 4, 3, 2, 2, 4))
>>        x[list(1)]            # doesn't work in BioC < 2.13!
>>        x[list(1)] <- "XX"    # works in BioC < 2.13!
>>
>>     Or, if both [ and [<- worked, they could behave inconsistently: one
>>     would require the list-like subscript to have the same length as 'x'
>>     but the other wouldn't. Or one would use the names on the subscript
>>     and on 'x' to map the list elements between the two, but the other
>>     wouldn't.
>>
>>     Hopefully in BioC 2.13, subsetting behaves more consistently (at least
>>     that was the intention). For example now the names on the subscript
>> and
>>     on 'x' are always used to map the list elements between the two:
>>
>>        > x[list(`4`=2:1)]
>>        CharacterList of length 1
>>        [["4"]] f b
>>
>>     Also now, it's an error if the subscript has names but 'x' has not:
>>
>>        > unname(x)[list(`4`=2:1)]
>>        Error in subsetListByList(x, i) :
>>
>>          cannot subscript an unnamed list-like object by a named
>>     list-like object
>>
>>     (I should probably change this message for: "cannot subset an unnamed
>>     list-like object by a named list-like subscript".)
>>
>>     This is to be consistent with subsetting a Vector object by name,
>> which
>>     fails if 'x' has no names:
>>
>>        > IRanges(1:4, 5)["a"]
>>
>>        Error in normalizeSingleBracketSubscrip**__t(i, x) :
>>          cannot subset by character when names are NULL
>>
>>     If the subscript is a list-like object with names, the assumption is
>>     that the user intended those names to be mapped against 'x' names.
>>
>>
>>
>> Why make this assumption?
>>
>
> As I said, this is how [<- was behaving in Bioc < 2.13, but not [.
> When reunifying a choice has to be made, and I chose to make [
> behave like [<- and not the other way around. For 3 reasons:
>   1. It makes subsetting by a list-like object more flexible.
>
  2. It feels more consistent with what subsetting a Vector object
>      by name does.
>   3. It's also consistent with what findOverlaps() and
>      subsetByOverlaps() have been doing for years on named
>      RangesList objects.
>
>
>  Three users here have not made it and were
>> surprised by the names on the index having any relevance to extraction.
>>
>
> The good news is that now that they've been surprised by this, they
> won't be surprised by the behavior of subsetByOverlaps() ;-)
> We cannot totally eliminate user surprises (depends too much on
> individual backgrounds), but we can minimize them by providing
> consistent behavior.
>
>
A long time ago we decided that the extraction should just happen in
parallel, and that subsetByOverlaps on RangesList was a special case (it's
already so different from basic [ extraction). Apparently, we forgot to
take out the [<- stuff. So I would argue that the change should go the
other way. Users will not expect the name-based matching. At least that was
the consensus 4 years ago.

Others should chime on in this discussion. Should List-wise extraction and
replacement match by the names of the List elements?

H.
>
>
>
>>     If 'x' doesn't have names, I think it should fail rather than silently
>>     fall back to position-based mapping. So at least you give a chance
>>     to the user to either put names on 'x' (maybe s/he just forgot) or to
>>     remove them from the subscript. If we really want to fall back to
>>     position-based mapping, at least it should issue a warning, I think.
>>
>>     One thing I didn't change from pre-BioC-2.13 behavior is that a
>>     list-like subscript (when unnamed) is not recycled along 'x'. It's
>>     open to discussion whether this would be a good thing to have or not.
>>     Changing this would be pretty disruptive though...
>>
>>     Cheers,
>>     H.
>>
>>
>>
>>     On 10/29/2013 03:51 PM, Michael Lawrence wrote:
>>
>>         I think we should just drop the names for the user. The Bioc <2.13
>>         behavior seems reasonable to me. Please elaborate on the subtle
>>         issues.
>>         Most users would not expect the *names* on the index to have any
>>         effect
>>         on the extraction, in accordance with the behavior of ordinary
>>         vectors.
>>         The only difference with Lists is that there is a partitioning,
>>         which
>>         seems unrelated to naming.
>>
>>         Michael
>>
>>
>>         On Tue, Oct 29, 2013 at 3:40 PM, Hervé Pagès <hpa...@fhcrc.org
>>         <mailto:hpa...@fhcrc.org>
>>         <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:
>>
>>              Hi Thomas,
>>
>>              For the same reasons that you cannot subset by names a
>>         Vector object
>>              with no names:
>>
>>                 > IRanges(1:4, width=10)[letters[1:4]]
>>                 Error in normalizeSingleBracketSubscrip**____t(i, x) :
>>
>>
>>                   cannot subset by character when names are NULL
>>
>>              you cannot subset an unnamed List object using a named
>>         list-like
>>              subscript. So in your case, just remove the names on
>>         'keep_ranges'
>>              (which are probably not desired anyway) before using it as a
>>              subscript:
>>
>>
>>                 > keep_ranges
>>                 CompressedIRangesList of length 18
>>                 $`1`
>>                 IRanges of length 1
>>                     start end width
>>                 [1]    20 108    89
>>
>>                 $`2`
>>                 IRanges of length 1
>>                     start end width
>>                 [1]    43 131    89
>>
>>                 $`3`
>>                 IRanges of length 1
>>                     start end width
>>                 [1]    21 105    85
>>
>>                 ...
>>                 <15 more elements>
>>
>>                 > return_rles[ unname(keep_ranges) ]
>>                 RleList of length 18
>>                 [[1]]
>>                 logical-Rle of length 89 with 1 run
>>                   Lengths:   89
>>                   Values : TRUE
>>
>>                 [[2]]
>>                 logical-Rle of length 89 with 1 run
>>                   Lengths:   89
>>                   Values : TRUE
>>
>>                 [[3]]
>>                 logical-Rle of length 85 with 1 run
>>                   Lengths:   85
>>                   Values : TRUE
>>
>>                 [[4]]
>>                 logical-Rle of length 85 with 1 run
>>                   Lengths:   85
>>                   Values : TRUE
>>
>>                 [[5]]
>>                 logical-Rle of length 102 with 1 run
>>                   Lengths:  102
>>                   Values : TRUE
>>
>>                 ...
>>                 <13 more elements>
>>
>>              Prior to BioC 2.13, it was possible to subset an unnamed
>>         List object by
>>              a named list-like subscript, and in that case, the names on
>> the
>>              subscript were ignored and the subscript was treated as
>>         parallel to the
>>              object to subset. However this behavior was somehow
>>         dangerous (could
>>              lead to subtle issues) and didn't follow the spirit of what
>>         subsetting
>>              an unnamed Vector by name does. So it's not supported
>> anymore.
>>
>>              Sorry for the inconvenience,
>>              H.
>>
>>
>>
>>              On 10/29/2013 03:05 PM, Thomas Sandmann wrote:
>>
>>                  Hi Herve,
>>
>>                  I have updated to IRanges 1.20.4 now, but
>>         unfortunately, I still
>>                  encounter an error when I try to subset a
>>         CompressedRleList or
>>                  SimpleRleList with a CompressedIRangesList or
>>         SimpleIRangesList.
>>
>>                  Would you mind having a look at where I am going wrong
>>         ? (My two
>>                  example
>>                  objects are available in the rdata object at the url
>>         shown below).
>>
>>
>>         con=url("http://dl.__dropboxus**__ercontent.com/u/__126180/__**
>> example.rdata<http://dropboxus__ercontent.com/u/__126180/__example.rdata>
>>         
>> <http://dropboxusercontent.**com/u/__126180/example.rdata<http://dropboxusercontent.com/u/__126180/example.rdata>
>> >
>>
>>
>>         <http://dl.dropboxusercontent.**__com/u/126180/example.rdata
>>
>>         
>> <http://dl.dropboxusercontent.**com/u/126180/example.rdata<http://dl.dropboxusercontent.com/u/126180/example.rdata>
>> >>")
>>                  load( con )
>>                  return_rles[ keep_ranges ]
>>
>>                  Error in subsetListByList(x, i) (from List-class.R#205) :
>>                      cannot subscript an unnamed list-like object by a
>> named
>>                  list-like object
>>
>>                  R version 3.0.2 (2013-09-25)
>>                  Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>>                  locale:
>>                     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>                     [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>                     [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>                     [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>                     [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>                  [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>                  attached base packages:
>>                  [1] parallel  stats     graphics  grDevices utils
>>         datasets
>>                    methods
>>                  [8] base
>>
>>                  other attached packages:
>>                     [1] trimPrimers_1.3.0    Rsamtools_1.14.1
>>         Biostrings_2.30.0
>>                     [4] GenomicRanges_1.14.2 XVector_0.2.0
>>           IRanges_1.20.4
>>                     [7] BiocGenerics_0.8.0   Defaults_1.1-1
>>                  BiocInstaller_1.12.0
>>                  [10] roxygen2_2.2.2       digest_0.6.3
>> devtools_1.3
>>
>>                  loaded via a namespace (and not attached):
>>                     [1] bitops_1.0-6   brew_1.0-6     compiler_3.0.2
>>                  evaluate_0.5.1 httr_0.2
>>                     [6] memoise_0.1    RCurl_1.95-4.1 stats4_3.0.2
>>         stringr_0.6.2
>>                     tools_3.0.2
>>                  [11] whisker_0.3-2  zlibbioc_1.8.0
>>
>>
>>              --
>>              Hervé Pagès
>>
>>              Program in Computational Biology
>>              Division of Public Health Sciences
>>              Fred Hutchinson Cancer Research Center
>>              1100 Fairview Ave. N, M1-B514
>>              P.O. Box 19024
>>              Seattle, WA 98109-1024
>>
>>              E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
>>         <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
>>
>>              Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>         <tel:%28206%29%20667-5791>
>>              Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>         <tel:%28206%29%20667-1319>
>>
>>              ______________________________**_____________________
>>         Bioc-devel@r-project.org 
>> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org>
>> >
>>         <mailto:Bioc-devel@r-project._**_org
>>         <mailto:Bioc-devel@r-project.**org <Bioc-devel@r-project.org>>>
>> mailing list
>>         
>> https://stat.ethz.ch/mailman/_**___listinfo/bioc-devel<https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
>>         
>> <https://stat.ethz.ch/mailman/**__listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>> >
>>
>>              
>> <https://stat.ethz.ch/mailman/**__listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>         
>> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>> >>
>>
>>
>>
>>     --
>>     Hervé Pagès
>>
>>     Program in Computational Biology
>>     Division of Public Health Sciences
>>     Fred Hutchinson Cancer Research Center
>>     1100 Fairview Ave. N, M1-B514
>>     P.O. Box 19024
>>     Seattle, WA 98109-1024
>>
>>     E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
>>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to