On 10/30/2013 10:07 AM, Michael Lawrence wrote:



On Wed, Oct 30, 2013 at 8:37 AM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:


    On 10/30/2013 07:55 AM, Michael Lawrence wrote:




        On Tue, Oct 29, 2013 at 5:55 PM, Hervé Pagès <hpa...@fhcrc.org
        <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:

             Hi Michael,

             In Bioc < 2.13, subsetting was a mess. In particular,
        handling of
             list-like subscripts was rather unpredictable. It would
        work only
             if you were lucky enough to try it with one of the few
        supported
             types (like IntegerList, LogicalList, or IRangesList), but
        it didn't
             work for other very natural types like list or CharacterList.
             Or it would work for [ but not for [<-, or vice-versa:

                x <- splitAsList(letters[1:6], c(2, 4, 3, 2, 2, 4))
                x[list(1)]            # doesn't work in BioC < 2.13!
                x[list(1)] <- "XX"    # works in BioC < 2.13!

             Or, if both [ and [<- worked, they could behave
        inconsistently: one
             would require the list-like subscript to have the same
        length as 'x'
             but the other wouldn't. Or one would use the names on the
        subscript
             and on 'x' to map the list elements between the two, but
        the other
             wouldn't.

             Hopefully in BioC 2.13, subsetting behaves more
        consistently (at least
             that was the intention). For example now the names on the
        subscript and
             on 'x' are always used to map the list elements between the
        two:

                > x[list(`4`=2:1)]
                CharacterList of length 1
                [["4"]] f b

             Also now, it's an error if the subscript has names but 'x'
        has not:

                > unname(x)[list(`4`=2:1)]
                Error in subsetListByList(x, i) :

                  cannot subscript an unnamed list-like object by a named
             list-like object

             (I should probably change this message for: "cannot subset
        an unnamed
             list-like object by a named list-like subscript".)

             This is to be consistent with subsetting a Vector object by
        name, which
             fails if 'x' has no names:

                > IRanges(1:4, 5)["a"]

                Error in normalizeSingleBracketSubscrip____t(i, x) :
                  cannot subset by character when names are NULL

             If the subscript is a list-like object with names, the
        assumption is
             that the user intended those names to be mapped against 'x'
        names.



        Why make this assumption?


    As I said, this is how [<- was behaving in Bioc < 2.13, but not [.
    When reunifying a choice has to be made, and I chose to make [
    behave like [<- and not the other way around. For 3 reasons:
       1. It makes subsetting by a list-like object more flexible.

       2. It feels more consistent with what subsetting a Vector object
          by name does.
       3. It's also consistent with what findOverlaps() and
          subsetByOverlaps() have been doing for years on named
          RangesList objects.


        Three users here have not made it and were
        surprised by the names on the index having any relevance to
        extraction.


    The good news is that now that they've been surprised by this, they
    won't be surprised by the behavior of subsetByOverlaps() ;-)
    We cannot totally eliminate user surprises (depends too much on
    individual backgrounds), but we can minimize them by providing
    consistent behavior.


A long time ago we decided that the extraction should just happen in
parallel, and that subsetByOverlaps on RangesList was a special case
(it's already so different from basic [ extraction). Apparently, we
forgot to take out the [<- stuff. So I would argue that the change
should go the other way. Users will not expect the name-based matching.
At least that was the consensus 4 years ago.

The consensus amongst who? AFAICT this behavior was not documented
and no unit test broke when I modified List-wise extraction to match
the names, so it looked more like a grey area to me than a conscious
decision.

Also none of the 100 or so software packages that depend directly or
indirectly on IRanges seemed to be affected by this change. The reason
for this is that the most common use case for List-wise extraction is
something like this:

  > cvg <- RleList(chr1=Rle(c(0, 1, 0), c(10, 5, 8)),
                   chr2=Rle(c(1, 0), c(6, 14)))

  > cvg[cvg >= 1]
  RleList of length 2
  $chr1
  numeric-Rle of length 5 with 1 run
    Lengths: 5
    Values : 1

  $chr2
  numeric-Rle of length 6 with 1 run
    Lengths: 6
    Values : 1

And this use case is not affected by List-wise extraction matching or
not the names. Thomas's use case is very unusual: the subscript looks
like the result of a split() and, most of the times, I would expect this
subscript to be used to subset an object that is also the result of a
split() (by a split factor with the same levels). So the object to
subset and the subscript would normally both end up with the same
names. But in his case, the object to subset has no names, I don't know
why. If List-wise extraction matches the names, subsetting still does
the right thing even if the split factors have levels not in the same
order (or if some levels in the factor used to split the subscript
are missing). If it doesn't match the names, the subsetting will make
no sense and the user won't even know it. No surprise but wrong result.

H.


Others should chime on in this discussion. Should List-wise extraction
and replacement match by the names of the List elements?

    H.



             If 'x' doesn't have names, I think it should fail rather
        than silently
             fall back to position-based mapping. So at least you give a
        chance
             to the user to either put names on 'x' (maybe s/he just
        forgot) or to
             remove them from the subscript. If we really want to fall
        back to
             position-based mapping, at least it should issue a warning,
        I think.

             One thing I didn't change from pre-BioC-2.13 behavior is that a
             list-like subscript (when unnamed) is not recycled along
        'x'. It's
             open to discussion whether this would be a good thing to
        have or not.
             Changing this would be pretty disruptive though...

             Cheers,
             H.



             On 10/29/2013 03:51 PM, Michael Lawrence wrote:

                 I think we should just drop the names for the user. The
        Bioc <2.13
                 behavior seems reasonable to me. Please elaborate on
        the subtle
                 issues.
                 Most users would not expect the *names* on the index to
        have any
                 effect
                 on the extraction, in accordance with the behavior of
        ordinary
                 vectors.
                 The only difference with Lists is that there is a
        partitioning,
                 which
                 seems unrelated to naming.

                 Michael


                 On Tue, Oct 29, 2013 at 3:40 PM, Hervé Pagès
        <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
                 <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
                 <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>>> wrote:

                      Hi Thomas,

                      For the same reasons that you cannot subset by names a
                 Vector object
                      with no names:

                         > IRanges(1:4, width=10)[letters[1:4]]
                         Error in
        normalizeSingleBracketSubscrip______t(i, x) :


                           cannot subset by character when names are NULL

                      you cannot subset an unnamed List object using a named
                 list-like
                      subscript. So in your case, just remove the names on
                 'keep_ranges'
                      (which are probably not desired anyway) before
        using it as a
                      subscript:


                         > keep_ranges
                         CompressedIRangesList of length 18
                         $`1`
                         IRanges of length 1
                             start end width
                         [1]    20 108    89

                         $`2`
                         IRanges of length 1
                             start end width
                         [1]    43 131    89

                         $`3`
                         IRanges of length 1
                             start end width
                         [1]    21 105    85

                         ...
                         <15 more elements>

                         > return_rles[ unname(keep_ranges) ]
                         RleList of length 18
                         [[1]]
                         logical-Rle of length 89 with 1 run
                           Lengths:   89
                           Values : TRUE

                         [[2]]
                         logical-Rle of length 89 with 1 run
                           Lengths:   89
                           Values : TRUE

                         [[3]]
                         logical-Rle of length 85 with 1 run
                           Lengths:   85
                           Values : TRUE

                         [[4]]
                         logical-Rle of length 85 with 1 run
                           Lengths:   85
                           Values : TRUE

                         [[5]]
                         logical-Rle of length 102 with 1 run
                           Lengths:  102
                           Values : TRUE

                         ...
                         <13 more elements>

                      Prior to BioC 2.13, it was possible to subset an
        unnamed
                 List object by
                      a named list-like subscript, and in that case, the
        names on the
                      subscript were ignored and the subscript was
        treated as
                 parallel to the
                      object to subset. However this behavior was somehow
                 dangerous (could
                      lead to subtle issues) and didn't follow the
        spirit of what
                 subsetting
                      an unnamed Vector by name does. So it's not
        supported anymore.

                      Sorry for the inconvenience,
                      H.



                      On 10/29/2013 03:05 PM, Thomas Sandmann wrote:

                          Hi Herve,

                          I have updated to IRanges 1.20.4 now, but
                 unfortunately, I still
                          encounter an error when I try to subset a
                 CompressedRleList or
                          SimpleRleList with a CompressedIRangesList or
                 SimpleIRangesList.

                          Would you mind having a look at where I am
        going wrong
                 ? (My two
                          example
                          objects are available in the rdata object at
        the url
                 shown below).



        
con=url("http://dl.__dropboxus____ercontent.com/u/__126180/____example.rdata
        <http://dropboxus__ercontent.com/u/__126180/__example.rdata>

        <http://dropboxusercontent.__com/u/__126180/example.rdata
        <http://dropboxusercontent.com/u/__126180/example.rdata>>



        <http://dl.dropboxusercontent.____com/u/126180/example.rdata


        <http://dl.dropboxusercontent.__com/u/126180/example.rdata
        <http://dl.dropboxusercontent.com/u/126180/example.rdata>>>")
                          load( con )
                          return_rles[ keep_ranges ]

                          Error in subsetListByList(x, i) (from
        List-class.R#205) :
                              cannot subscript an unnamed list-like
        object by a named
                          list-like object

                          R version 3.0.2 (2013-09-25)
                          Platform: x86_64-unknown-linux-gnu (64-bit)

                          locale:
                             [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
                             [3] LC_TIME=en_US.UTF-8
          LC_COLLATE=en_US.UTF-8
                             [5] LC_MONETARY=en_US.UTF-8
          LC_MESSAGES=en_US.UTF-8
                             [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
                             [9] LC_ADDRESS=C               LC_TELEPHONE=C
                          [11] LC_MEASUREMENT=en_US.UTF-8
        LC_IDENTIFICATION=C

                          attached base packages:
                          [1] parallel  stats     graphics  grDevices utils
                 datasets
                            methods
                          [8] base

                          other attached packages:
                             [1] trimPrimers_1.3.0    Rsamtools_1.14.1
                 Biostrings_2.30.0
                             [4] GenomicRanges_1.14.2 XVector_0.2.0
                   IRanges_1.20.4
                             [7] BiocGenerics_0.8.0   Defaults_1.1-1
                          BiocInstaller_1.12.0
                          [10] roxygen2_2.2.2       digest_0.6.3
        devtools_1.3

                          loaded via a namespace (and not attached):
                             [1] bitops_1.0-6   brew_1.0-6
        compiler_3.0.2
                          evaluate_0.5.1 httr_0.2
                             [6] memoise_0.1    RCurl_1.95-4.1 stats4_3.0.2
                 stringr_0.6.2
                             tools_3.0.2
                          [11] whisker_0.3-2  zlibbioc_1.8.0


                      --
                      Hervé Pagès

                      Program in Computational Biology
                      Division of Public Health Sciences
                      Fred Hutchinson Cancer Research Center
                      1100 Fairview Ave. N, M1-B514
                      P.O. Box 19024
                      Seattle, WA 98109-1024

                      E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
                 <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>>

                      Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
        <tel:%28206%29%20667-5791>
                 <tel:%28206%29%20667-5791>
                      Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
        <tel:%28206%29%20667-1319>
                 <tel:%28206%29%20667-1319>

                      _____________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        <mailto:Bioc-devel@r-project.__org
        <mailto:Bioc-devel@r-project.org>>
                 <mailto:Bioc-devel@r-project.
        <mailto:Bioc-devel@r-project.>____org
                 <mailto:Bioc-devel@r-project.__org
        <mailto:Bioc-devel@r-project.org>>> mailing list
        https://stat.ethz.ch/mailman/______listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
                 <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>


          <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
                 <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>



             --
             Hervé Pagès

             Program in Computational Biology
             Division of Public Health Sciences
             Fred Hutchinson Cancer Research Center
             1100 Fairview Ave. N, M1-B514
             P.O. Box 19024
             Seattle, WA 98109-1024

             E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
             Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
        <tel:%28206%29%20667-5791>
             Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
        <tel:%28206%29%20667-1319>



    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to