Re: [Bioc-devel] deprecation of keepSeqlevels and renameSeqlevels

Michael Lawrence Tue, 21 May 2013 18:38:19 -0700

It's clear we're not going to agree. I will volunteer to maintain these,
OK? I will maintain them anyway for my own use.


Michael


On Tue, May 21, 2013 at 2:34 PM, Hervé Pagès <hpa...@fhcrc.org> wrote:

>
>
> On 05/21/2013 08:00 AM, Michael Lawrence wrote:
>
>>
>>
>>
>> On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès <hpa...@fhcrc.org
>> <mailto:hpa...@fhcrc.org>> wrote:
>>
>>     Michael,
>>
>>
>>     On 05/20/2013 08:44 PM, Michael Lawrence wrote:
>>
>>
>>
>>
>>         On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès <hpa...@fhcrc.org
>>         <mailto:hpa...@fhcrc.org>
>>         <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:
>>
>>              Hi,
>>
>>
>>              On 05/20/2013 03:15 PM, Michael Lawrence wrote:
>>
>>                  On Mon, May 20, 2013 at 3:11 PM, Martin Morgan
>>                  <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>
>>         <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>> wrote:
>>
>>                      Hi Michael --
>>
>>
>>                      On 5/20/2013 5:56 AM, Michael Lawrence wrote:
>>
>>                          While it's cool that seqlevels<- has become so
>>         flexible,
>>                          I still claim
>>                          that
>>                          rename/keep/drop would be a lot easier to read,
>>         because
>>                          they describe the
>>                          high-level operation, and there's no reason to
>>         deprecate
>>                          them. They're
>>                          also
>>                          easier to remember. For example, for dropping
>> with
>>                          seqlevels<-, the user
>>                          needs
>>                          to remember that "force" is necessary to drop.
>> Much
>>                          easier to just say
>>                          "dropSeqlevels(), please". And reimplementing
>>                          keepSeqlevels is still too
>>                          complicated. Is it such a maintenance burden to
>>         have
>>                          these simple
>>                          wrappers sit
>>                          on top of the low-level manipulators?
>>
>>                          Another suggestion: renameSeqlevels should not
>>         require a
>>                          named vector (in
>>                          cases
>>                          where it is unnamed, require that it is
>>         parallel to the
>>                          existing
>>                          seqlevels, as
>>                          with seqlevels<-).
>>
>>
>>                      I didn't really indicate what drove my desire to see
>>                      keepSeqlevels
>>                      deprecated. keepSeqlevels, seqlevels<-, and
>>         isActiveSeq were
>>                      developed more
>>                      or less independently, and have different
>>         contracts. The
>>                      different
>>                      contracts are confusing to the user, as is creating
>>         a usable
>>                      help system
>>                      when there are multiple end points for what is a
>> common
>>                      operation. The help
>>                      pages of each were inadequate in different ways.
>>         And because
>>                      the code was
>>                      developed independently, support for different
>>         objects was
>>                      inconsistent. So
>>                      actually, yes, the maintenance (and use) burden for
>> the
>>                      previous state of
>>                      affairs was substantial.
>>
>>                      On the other hand, I agree that keepSeqlevels is
>>         convenient
>>                      as a simple
>>                      wrapper around seqlevels<-, in the same way that
>>         setNames
>>                      and names<- are
>>                      both useful.
>>
>>                      So we could iterate to
>>
>>                          keepSeqlevels <-
>>                              function(x, value, ...)
>>                          {
>>                              seqlevels(x, force=TRUE) <- value
>>                              x
>>                          }
>>
>>                      but I'd be less enthusiastic about maintaining the
>>         original
>>                      contract of
>>                      keepSeqlevels, where keepSeqlevels(gr1, gr2), would
>>         have
>>                      worked for two
>>                      GRanges objects.
>>
>>
>>              Why would this be called keepSeqlevels() if, depending on
>>         how it's used,
>>              it will either add, drop, rename, or permute the seqlevels?
>>              Couldn't this be called setSeqlevels?
>>
>>
>>         I thought that new2old was necessary for permuting.
>>
>>
>>     The seqlevels setter has no 'new2old' arg.
>>
>>
>>         You're right that
>>         adding should be disallowed for keepSeqlevels(). Adding
>>         seqlevels is not
>>         a common operation. The two common operations are:
>>         * Permuting, usually because the data were imported in
>> non-canonical
>>         order (seqnameStyle was designed to address this, no?).
>>
>>
>>     Permuting is straightforward with seqlevels<-:
>>
>>        > seqlevels(gr)
>>        [1] "chr1" "chr2" "chr3"
>>
>>        > seqlevels(gr) <- rev(seqlevels(gr))
>>
>>        > seqlevels(gr)
>>        [1] "chr3" "chr2" "chr1"
>>
>>
>>         * Subsetting, either by keeping or dropping.
>>
>>
>>     Also straightforward:
>>
>>        > seqlevels(gr, force=TRUE) <- "chr2"
>>
>>        > seqlevels(gr)
>>        [1] "chr2"
>>
>>
>>         Two main reasons: taking a
>>         small slice of the data for prototyping, or removing problematic
>>         chromosomes (sex, circular). This goes back to bsapply and the
>>         exclude
>>         argument.
>>
>>
>>     There are other important use cases:
>>
>>        - Dropping seqlevels that are *not* in use. This happens for
>> example
>>          when subsetting a BAM file with Rsamtools::filterBam to keep only
>>          the alignments located on a given chromosome. The entire sequence
>>          dictionary (in the header) is propagated to the resulting BAM
>> file
>>          so when you read the file with readGAlignments() you end up with
>> a
>>          bunch of useless seqlevels that you generally start by dropping.
>>          You don't want to drop any alignment so force=FALSE is your
>> friend.
>>
>>
>> This is sort of tangential to the discussion, but do you really want to
>> do this? I would preserve the universe as given by the BAM.
>>
>>        - An even more common operation is renaming: 90% of the times
>> that's
>>          what I use seqlevels<- for, and that's what I tell people to use
>>          when they need to rename their chromosomes. Also straightforward:
>>
>>            > seqlevels(gr)
>>            [1] "chr1" "chr2" "chr3" "chrM"
>>
>>            > seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
>>            > seqlevels(gr)
>>            [1] "1" "2" "3" "M"
>>
>>            > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT"
>>            > seqlevels(gr)
>>            [1] "1"  "2"  "3"  "MT"
>>
>>          Note that this is simpler to use than renameSeqlevels() which
>>          always required you to build the entire right value as a named
>>          vector.
>>
>>
>> Sure, renaming is a common use case. Not sure how I forgot that.
>>
>> As I wrote earlier in the thread, renameSeqlevels should be changed so
>> as not to require naming of the vector.
>>
>>     So the added-value of keepSeqlevels() seems to boil down to:
>>
>>        (a) it always uses force=TRUE
>>
>>        (b) it preserves the original object and returns the modified
>> object
>>
>>     If you want to restrict the use of keepSeqlevels() to permuting and
>>     dropping, you'll also have to disallow renaming (in addition to
>> adding).
>>     Then its name will more or less reflect what it does (if "keep" means
>>     "permute" or "drop"). The final result will be something that does
>> less
>>     than setSeqlevels() and that is not necessarily easier to read: both
>>     will set on the object whatever vector of seqlevels they are supplied,
>>     except that one will fail when doing so would actually result in
>> adding
>>     or renaming seqlevels.
>>
>>
>> So it looks like seqlevels<- is pretty powerful. Now I see that if I
>> scroll down to the examples in the man page, I find the actual
>> documentation. It's cool to learn that we can replace seqlevels on TxDb
>> objects.
>>
>
> Has been working on TranscriptDb objects for more than
>
>
>  My argument has always been that since these low-level
>> operations are so powerful, it's nice to have high-level operations to
>> clarify the code. We have to think hard to know what the RHS will do in
>> that replacement.
>>
>
> When you do something like seqlevels(gr) <- new_seqlevels, you're saying
> "can you please replace the current seqlevels by those new ones".
> That's all. The request/contract is simple and all the complexity is
> taken care of internally by seqlevels<-. It might be that, depending on
> what's currently on 'gr', this will actually result in dropping, adding,
> permuting. Or *a combination of the 3*. Like here:
>
>
>   > seqlevels(gr)
>   [1] "chr1" "chr2" "chr3"
>
>   > seqlevels(gr) <- c("chr4", "chr3", "chr1")
>   > seqlevels(gr)
>   [1] "chr4" "chr3" "chr1"
>
>   # one seqlevel was drop, one was added, and the 2 seqlevels that were
>   # preserved were reordered.
>
> When you want to subset a vector to keep all elements with an even
> index you do x[c(FALSE, TRUE)]. Is that keeping or dropping elements?
> Do we need to have a subsetting operator for keeping elements, another
> one for dropping elements, and another one for permuting them (like
> in x[sample(length(x))])?
>
> H.
>
>  It's probably one of the most complex replacement
>> operations; far more complex than the typical one, including levels<- on
>> factor. There's nothing wrong with that; it's just complexity that we
>> should be able to hide.
>>
>> Michael
>>
>>     H.
>>
>>
>>         Michael
>>
>>
>>              H.
>>
>>
>>
>>                  Well, I wasn't even aware of that feature, so you've
>>         made your
>>                  point about
>>                  the documentation ;) Sounds like a good compromise to me.
>>
>>                  Thanks for understanding,
>>                  Michael
>>
>>
>>
>>                      Martin
>>
>>
>>                          Michael
>>
>>
>>
>>
>>
>>
>>                          On Sat, May 18, 2013 at 6:00 PM, Martin Morgan
>>                          <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>
>>         <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>
>>                          <mailto:mtmor...@fhcrc.org
>>         <mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org
>>         <mailto:mtmor...@fhcrc.org>>>>
>>
>>                          wrote:
>>
>>                                On 05/18/2013 05:42 PM, Martin Morgan
>> wrote:
>>
>>                                    Some of the most common operations are
>>                          straight-forward, e.g.,
>>
>>                                          > gr = GRanges(paste0("chr",
>>         c(1:22,
>>                          "X", "Y")), IRanges(1,
>>                          100))
>>                                          > seqlevels(gr) = sub("chr",
>> "ch",
>>                          seqlevels(gr))
>>
>>
>>                                To get a more comprehensive example I
>>         should have
>>                          followed my own
>>                          advice and
>>                                grabbed from the help page!
>>
>>                                      ## Rename:
>>                                      seqlevels(txdb) <- sub("chr", "",
>>                          seqlevels(txdb))
>>                                      seqlevels(txdb)
>>
>>                                      seqlevels(txdb) <- paste0("CH",
>>         seqlevels(txdb))
>>                                      seqlevels(txdb)
>>
>>
>>         seqlevels(txdb)[seqlevels(__****____txdb) ==
>>
>>
>>                          "CHM"] <- "M"
>>
>>
>>                                      seqlevels(txdb)
>>
>>
>>                                --
>>                                Computational Biology / Fred Hutchinson
>>         Cancer
>>                          Research Center
>>                                1100 Fairview Ave. N.
>>                                PO Box 19024 Seattle, WA 98109
>>
>>                                Location: Arnold Building M1 B861
>>                                Phone: (206) 667-2793
>>         <tel:%28206%29%20667-2793> <tel:%28206%29%20667-2793>
>>                          <tel:%28206%29%20667-2793>
>>
>>
>>
>>
>>                      --
>>                      Dr. Martin Morgan, PhD
>>
>>                      Fred Hutchinson Cancer Research Center
>>                      1100 Fairview Ave. N.
>>                      PO Box 19024 Seattle, WA 98109
>>
>>
>>                           [[alternative HTML version deleted]]
>>
>>                  ______________________________**_____________________
>>         Bioc-devel@r-project.org 
>> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org>
>> >
>>         <mailto:Bioc-devel@r-project._**_org
>>         <mailto:Bioc-devel@r-project.**org <Bioc-devel@r-project.org>>>
>>                  mailing list
>>         
>> https://stat.ethz.ch/mailman/_**___listinfo/bioc-devel<https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
>>         
>> <https://stat.ethz.ch/mailman/**__listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>> >
>>
>>
>>                  
>> <https://stat.ethz.ch/mailman/**__listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>         
>> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>> >>
>>
>>
>>              --
>>              Hervé Pagès
>>
>>              Program in Computational Biology
>>              Division of Public Health Sciences
>>
>>              Fred Hutchinson Cancer Research Center
>>              1100 Fairview Ave. N, M1-B514
>>              P.O. Box 19024
>>              Seattle, WA 98109-1024
>>
>>              E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
>>         <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
>>
>>              Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>         <tel:%28206%29%20667-5791>
>>              Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>         <tel:%28206%29%20667-1319>
>>
>>
>>
>>     --
>>     Hervé Pagès
>>
>>     Program in Computational Biology
>>     Division of Public Health Sciences
>>     Fred Hutchinson Cancer Research Center
>>     1100 Fairview Ave. N, M1-B514
>>     P.O. Box 19024
>>     Seattle, WA 98109-1024
>>
>>     E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
>>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpa...@fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] deprecation of keepSeqlevels and renameSeqlevels

Reply via email to