Michael,

On 05/20/2013 08:44 PM, Michael Lawrence wrote:



On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:

    Hi,


    On 05/20/2013 03:15 PM, Michael Lawrence wrote:

        On Mon, May 20, 2013 at 3:11 PM, Martin Morgan
        <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> wrote:

            Hi Michael --


            On 5/20/2013 5:56 AM, Michael Lawrence wrote:

                While it's cool that seqlevels<- has become so flexible,
                I still claim
                that
                rename/keep/drop would be a lot easier to read, because
                they describe the
                high-level operation, and there's no reason to deprecate
                them. They're
                also
                easier to remember. For example, for dropping with
                seqlevels<-, the user
                needs
                to remember that "force" is necessary to drop. Much
                easier to just say
                "dropSeqlevels(), please". And reimplementing
                keepSeqlevels is still too
                complicated. Is it such a maintenance burden to have
                these simple
                wrappers sit
                on top of the low-level manipulators?

                Another suggestion: renameSeqlevels should not require a
                named vector (in
                cases
                where it is unnamed, require that it is parallel to the
                existing
                seqlevels, as
                with seqlevels<-).


            I didn't really indicate what drove my desire to see
            keepSeqlevels
            deprecated. keepSeqlevels, seqlevels<-, and isActiveSeq were
            developed more
            or less independently, and have different contracts. The
            different
            contracts are confusing to the user, as is creating a usable
            help system
            when there are multiple end points for what is a common
            operation. The help
            pages of each were inadequate in different ways. And because
            the code was
            developed independently, support for different objects was
            inconsistent. So
            actually, yes, the maintenance (and use) burden for the
            previous state of
            affairs was substantial.

            On the other hand, I agree that keepSeqlevels is convenient
            as a simple
            wrapper around seqlevels<-, in the same way that setNames
            and names<- are
            both useful.

            So we could iterate to

                keepSeqlevels <-
                    function(x, value, ...)
                {
                    seqlevels(x, force=TRUE) <- value
                    x
                }

            but I'd be less enthusiastic about maintaining the original
            contract of
            keepSeqlevels, where keepSeqlevels(gr1, gr2), would have
            worked for two
            GRanges objects.


    Why would this be called keepSeqlevels() if, depending on how it's used,
    it will either add, drop, rename, or permute the seqlevels?
    Couldn't this be called setSeqlevels?


I thought that new2old was necessary for permuting.

The seqlevels setter has no 'new2old' arg.

You're right that
adding should be disallowed for keepSeqlevels(). Adding seqlevels is not
a common operation. The two common operations are:
* Permuting, usually because the data were imported in non-canonical
order (seqnameStyle was designed to address this, no?).

Permuting is straightforward with seqlevels<-:

  > seqlevels(gr)
  [1] "chr1" "chr2" "chr3"

  > seqlevels(gr) <- rev(seqlevels(gr))

  > seqlevels(gr)
  [1] "chr3" "chr2" "chr1"

* Subsetting, either by keeping or dropping.

Also straightforward:

  > seqlevels(gr, force=TRUE) <- "chr2"

  > seqlevels(gr)
  [1] "chr2"

Two main reasons: taking a
small slice of the data for prototyping, or removing problematic
chromosomes (sex, circular). This goes back to bsapply and the exclude
argument.

There are other important use cases:

  - Dropping seqlevels that are *not* in use. This happens for example
    when subsetting a BAM file with Rsamtools::filterBam to keep only
    the alignments located on a given chromosome. The entire sequence
    dictionary (in the header) is propagated to the resulting BAM file
    so when you read the file with readGAlignments() you end up with a
    bunch of useless seqlevels that you generally start by dropping.
    You don't want to drop any alignment so force=FALSE is your friend.

  - An even more common operation is renaming: 90% of the times that's
    what I use seqlevels<- for, and that's what I tell people to use
    when they need to rename their chromosomes. Also straightforward:

      > seqlevels(gr)
      [1] "chr1" "chr2" "chr3" "chrM"

      > seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
      > seqlevels(gr)
      [1] "1" "2" "3" "M"

      > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT"
      > seqlevels(gr)
      [1] "1"  "2"  "3"  "MT"

    Note that this is simpler to use than renameSeqlevels() which
    always required you to build the entire right value as a named
    vector.

So the added-value of keepSeqlevels() seems to boil down to:

  (a) it always uses force=TRUE

  (b) it preserves the original object and returns the modified object

If you want to restrict the use of keepSeqlevels() to permuting and
dropping, you'll also have to disallow renaming (in addition to adding).
Then its name will more or less reflect what it does (if "keep" means
"permute" or "drop"). The final result will be something that does less
than setSeqlevels() and that is not necessarily easier to read: both
will set on the object whatever vector of seqlevels they are supplied,
except that one will fail when doing so would actually result in adding
or renaming seqlevels.

H.


Michael


    H.



        Well, I wasn't even aware of that feature, so you've made your
        point about
        the documentation ;) Sounds like a good compromise to me.

        Thanks for understanding,
        Michael



            Martin


                Michael






                On Sat, May 18, 2013 at 6:00 PM, Martin Morgan
                <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>
                <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>>
                wrote:

                      On 05/18/2013 05:42 PM, Martin Morgan wrote:

                          Some of the most common operations are
                straight-forward, e.g.,

                                > gr = GRanges(paste0("chr", c(1:22,
                "X", "Y")), IRanges(1,
                100))
                                > seqlevels(gr) = sub("chr", "ch",
                seqlevels(gr))


                      To get a more comprehensive example I should have
                followed my own
                advice and
                      grabbed from the help page!

                            ## Rename:
                            seqlevels(txdb) <- sub("chr", "",
                seqlevels(txdb))
                            seqlevels(txdb)

                            seqlevels(txdb) <- paste0("CH", seqlevels(txdb))
                            seqlevels(txdb)

                            seqlevels(txdb)[seqlevels(__**__txdb) ==
                "CHM"] <- "M"


                            seqlevels(txdb)


                      --
                      Computational Biology / Fred Hutchinson Cancer
                Research Center
                      1100 Fairview Ave. N.
                      PO Box 19024 Seattle, WA 98109

                      Location: Arnold Building M1 B861
                      Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
                <tel:%28206%29%20667-2793>




            --
            Dr. Martin Morgan, PhD

            Fred Hutchinson Cancer Research Center
            1100 Fairview Ave. N.
            PO Box 19024 Seattle, WA 98109


                 [[alternative HTML version deleted]]

        _________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>


    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences

    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to