On 10/30/2013 07:55 AM, Michael Lawrence wrote:



On Tue, Oct 29, 2013 at 5:55 PM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:

    Hi Michael,

    In Bioc < 2.13, subsetting was a mess. In particular, handling of
    list-like subscripts was rather unpredictable. It would work only
    if you were lucky enough to try it with one of the few supported
    types (like IntegerList, LogicalList, or IRangesList), but it didn't
    work for other very natural types like list or CharacterList.
    Or it would work for [ but not for [<-, or vice-versa:

       x <- splitAsList(letters[1:6], c(2, 4, 3, 2, 2, 4))
       x[list(1)]            # doesn't work in BioC < 2.13!
       x[list(1)] <- "XX"    # works in BioC < 2.13!

    Or, if both [ and [<- worked, they could behave inconsistently: one
    would require the list-like subscript to have the same length as 'x'
    but the other wouldn't. Or one would use the names on the subscript
    and on 'x' to map the list elements between the two, but the other
    wouldn't.

    Hopefully in BioC 2.13, subsetting behaves more consistently (at least
    that was the intention). For example now the names on the subscript and
    on 'x' are always used to map the list elements between the two:

       > x[list(`4`=2:1)]
       CharacterList of length 1
       [["4"]] f b

    Also now, it's an error if the subscript has names but 'x' has not:

       > unname(x)[list(`4`=2:1)]
       Error in subsetListByList(x, i) :

         cannot subscript an unnamed list-like object by a named
    list-like object

    (I should probably change this message for: "cannot subset an unnamed
    list-like object by a named list-like subscript".)

    This is to be consistent with subsetting a Vector object by name, which
    fails if 'x' has no names:

       > IRanges(1:4, 5)["a"]

       Error in normalizeSingleBracketSubscrip__t(i, x) :
         cannot subset by character when names are NULL

    If the subscript is a list-like object with names, the assumption is
    that the user intended those names to be mapped against 'x' names.



Why make this assumption?

As I said, this is how [<- was behaving in Bioc < 2.13, but not [.
When reunifying a choice has to be made, and I chose to make [
behave like [<- and not the other way around. For 3 reasons:
  1. It makes subsetting by a list-like object more flexible.
  2. It feels more consistent with what subsetting a Vector object
     by name does.
  3. It's also consistent with what findOverlaps() and
     subsetByOverlaps() have been doing for years on named
     RangesList objects.

Three users here have not made it and were
surprised by the names on the index having any relevance to extraction.

The good news is that now that they've been surprised by this, they
won't be surprised by the behavior of subsetByOverlaps() ;-)
We cannot totally eliminate user surprises (depends too much on
individual backgrounds), but we can minimize them by providing
consistent behavior.

H.



    If 'x' doesn't have names, I think it should fail rather than silently
    fall back to position-based mapping. So at least you give a chance
    to the user to either put names on 'x' (maybe s/he just forgot) or to
    remove them from the subscript. If we really want to fall back to
    position-based mapping, at least it should issue a warning, I think.

    One thing I didn't change from pre-BioC-2.13 behavior is that a
    list-like subscript (when unnamed) is not recycled along 'x'. It's
    open to discussion whether this would be a good thing to have or not.
    Changing this would be pretty disruptive though...

    Cheers,
    H.



    On 10/29/2013 03:51 PM, Michael Lawrence wrote:

        I think we should just drop the names for the user. The Bioc <2.13
        behavior seems reasonable to me. Please elaborate on the subtle
        issues.
        Most users would not expect the *names* on the index to have any
        effect
        on the extraction, in accordance with the behavior of ordinary
        vectors.
        The only difference with Lists is that there is a partitioning,
        which
        seems unrelated to naming.

        Michael


        On Tue, Oct 29, 2013 at 3:40 PM, Hervé Pagès <hpa...@fhcrc.org
        <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:

             Hi Thomas,

             For the same reasons that you cannot subset by names a
        Vector object
             with no names:

                > IRanges(1:4, width=10)[letters[1:4]]
                Error in normalizeSingleBracketSubscrip____t(i, x) :

                  cannot subset by character when names are NULL

             you cannot subset an unnamed List object using a named
        list-like
             subscript. So in your case, just remove the names on
        'keep_ranges'
             (which are probably not desired anyway) before using it as a
             subscript:


                > keep_ranges
                CompressedIRangesList of length 18
                $`1`
                IRanges of length 1
                    start end width
                [1]    20 108    89

                $`2`
                IRanges of length 1
                    start end width
                [1]    43 131    89

                $`3`
                IRanges of length 1
                    start end width
                [1]    21 105    85

                ...
                <15 more elements>

                > return_rles[ unname(keep_ranges) ]
                RleList of length 18
                [[1]]
                logical-Rle of length 89 with 1 run
                  Lengths:   89
                  Values : TRUE

                [[2]]
                logical-Rle of length 89 with 1 run
                  Lengths:   89
                  Values : TRUE

                [[3]]
                logical-Rle of length 85 with 1 run
                  Lengths:   85
                  Values : TRUE

                [[4]]
                logical-Rle of length 85 with 1 run
                  Lengths:   85
                  Values : TRUE

                [[5]]
                logical-Rle of length 102 with 1 run
                  Lengths:  102
                  Values : TRUE

                ...
                <13 more elements>

             Prior to BioC 2.13, it was possible to subset an unnamed
        List object by
             a named list-like subscript, and in that case, the names on the
             subscript were ignored and the subscript was treated as
        parallel to the
             object to subset. However this behavior was somehow
        dangerous (could
             lead to subtle issues) and didn't follow the spirit of what
        subsetting
             an unnamed Vector by name does. So it's not supported anymore.

             Sorry for the inconvenience,
             H.



             On 10/29/2013 03:05 PM, Thomas Sandmann wrote:

                 Hi Herve,

                 I have updated to IRanges 1.20.4 now, but
        unfortunately, I still
                 encounter an error when I try to subset a
        CompressedRleList or
                 SimpleRleList with a CompressedIRangesList or
        SimpleIRangesList.

                 Would you mind having a look at where I am going wrong
        ? (My two
                 example
                 objects are available in the rdata object at the url
        shown below).


        con=url("http://dl.__dropboxus__ercontent.com/u/__126180/__example.rdata
        <http://dropboxusercontent.com/u/__126180/example.rdata>


        <http://dl.dropboxusercontent.__com/u/126180/example.rdata
        <http://dl.dropboxusercontent.com/u/126180/example.rdata>>")
                 load( con )
                 return_rles[ keep_ranges ]

                 Error in subsetListByList(x, i) (from List-class.R#205) :
                     cannot subscript an unnamed list-like object by a named
                 list-like object

                 R version 3.0.2 (2013-09-25)
                 Platform: x86_64-unknown-linux-gnu (64-bit)

                 locale:
                    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
                    [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
                    [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
                    [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
                    [9] LC_ADDRESS=C               LC_TELEPHONE=C
                 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

                 attached base packages:
                 [1] parallel  stats     graphics  grDevices utils
        datasets
                   methods
                 [8] base

                 other attached packages:
                    [1] trimPrimers_1.3.0    Rsamtools_1.14.1
        Biostrings_2.30.0
                    [4] GenomicRanges_1.14.2 XVector_0.2.0
          IRanges_1.20.4
                    [7] BiocGenerics_0.8.0   Defaults_1.1-1
                 BiocInstaller_1.12.0
                 [10] roxygen2_2.2.2       digest_0.6.3         devtools_1.3

                 loaded via a namespace (and not attached):
                    [1] bitops_1.0-6   brew_1.0-6     compiler_3.0.2
                 evaluate_0.5.1 httr_0.2
                    [6] memoise_0.1    RCurl_1.95-4.1 stats4_3.0.2
        stringr_0.6.2
                    tools_3.0.2
                 [11] whisker_0.3-2  zlibbioc_1.8.0


             --
             Hervé Pagès

             Program in Computational Biology
             Division of Public Health Sciences
             Fred Hutchinson Cancer Research Center
             1100 Fairview Ave. N, M1-B514
             P.O. Box 19024
             Seattle, WA 98109-1024

             E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
             Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
        <tel:%28206%29%20667-5791>
             Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
        <tel:%28206%29%20667-1319>

             ___________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        <mailto:Bioc-devel@r-project.__org
        <mailto:Bioc-devel@r-project.org>> mailing list
        https://stat.ethz.ch/mailman/____listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
             <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>



    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to