Michael, On 05/20/2013 08:44 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> wrote: Hi, On 05/20/2013 03:15 PM, Michael Lawrence wrote: On Mon, May 20, 2013 at 3:11 PM, Martin Morgan <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> wrote: Hi Michael -- On 5/20/2013 5:56 AM, Michael Lawrence wrote: While it's cool that seqlevels<- has become so flexible, I still claim that rename/keep/drop would be a lot easier to read, because they describe the high-level operation, and there's no reason to deprecate them. They're also easier to remember. For example, for dropping with seqlevels<-, the user needs to remember that "force" is necessary to drop. Much easier to just say "dropSeqlevels(), please". And reimplementing keepSeqlevels is still too complicated. Is it such a maintenance burden to have these simple wrappers sit on top of the low-level manipulators? Another suggestion: renameSeqlevels should not require a named vector (in cases where it is unnamed, require that it is parallel to the existing seqlevels, as with seqlevels<-). I didn't really indicate what drove my desire to see keepSeqlevels deprecated. keepSeqlevels, seqlevels<-, and isActiveSeq were developed more or less independently, and have different contracts. The different contracts are confusing to the user, as is creating a usable help system when there are multiple end points for what is a common operation. The help pages of each were inadequate in different ways. And because the code was developed independently, support for different objects was inconsistent. So actually, yes, the maintenance (and use) burden for the previous state of affairs was substantial. On the other hand, I agree that keepSeqlevels is convenient as a simple wrapper around seqlevels<-, in the same way that setNames and names<- are both useful. So we could iterate to keepSeqlevels <- function(x, value, ...) { seqlevels(x, force=TRUE) <- value x } but I'd be less enthusiastic about maintaining the original contract of keepSeqlevels, where keepSeqlevels(gr1, gr2), would have worked for two GRanges objects. Why would this be called keepSeqlevels() if, depending on how it's used, it will either add, drop, rename, or permute the seqlevels? Couldn't this be called setSeqlevels? I thought that new2old was necessary for permuting.
The seqlevels setter has no 'new2old' arg.
You're right that adding should be disallowed for keepSeqlevels(). Adding seqlevels is not a common operation. The two common operations are: * Permuting, usually because the data were imported in non-canonical order (seqnameStyle was designed to address this, no?).
Permuting is straightforward with seqlevels<-: > seqlevels(gr) [1] "chr1" "chr2" "chr3" > seqlevels(gr) <- rev(seqlevels(gr)) > seqlevels(gr) [1] "chr3" "chr2" "chr1"
* Subsetting, either by keeping or dropping.
Also straightforward: > seqlevels(gr, force=TRUE) <- "chr2" > seqlevels(gr) [1] "chr2"
Two main reasons: taking a small slice of the data for prototyping, or removing problematic chromosomes (sex, circular). This goes back to bsapply and the exclude argument.
There are other important use cases: - Dropping seqlevels that are *not* in use. This happens for example when subsetting a BAM file with Rsamtools::filterBam to keep only the alignments located on a given chromosome. The entire sequence dictionary (in the header) is propagated to the resulting BAM file so when you read the file with readGAlignments() you end up with a bunch of useless seqlevels that you generally start by dropping. You don't want to drop any alignment so force=FALSE is your friend. - An even more common operation is renaming: 90% of the times that's what I use seqlevels<- for, and that's what I tell people to use when they need to rename their chromosomes. Also straightforward: > seqlevels(gr) [1] "chr1" "chr2" "chr3" "chrM" > seqlevels(gr) <- sub("^chr", "", seqlevels(gr)) > seqlevels(gr) [1] "1" "2" "3" "M" > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT" > seqlevels(gr) [1] "1" "2" "3" "MT" Note that this is simpler to use than renameSeqlevels() which always required you to build the entire right value as a named vector. So the added-value of keepSeqlevels() seems to boil down to: (a) it always uses force=TRUE (b) it preserves the original object and returns the modified object If you want to restrict the use of keepSeqlevels() to permuting and dropping, you'll also have to disallow renaming (in addition to adding). Then its name will more or less reflect what it does (if "keep" means "permute" or "drop"). The final result will be something that does less than setSeqlevels() and that is not necessarily easier to read: both will set on the object whatever vector of seqlevels they are supplied, except that one will fail when doing so would actually result in adding or renaming seqlevels. H.
Michael H. Well, I wasn't even aware of that feature, so you've made your point about the documentation ;) Sounds like a good compromise to me. Thanks for understanding, Michael Martin Michael On Sat, May 18, 2013 at 6:00 PM, Martin Morgan <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>> wrote: On 05/18/2013 05:42 PM, Martin Morgan wrote: Some of the most common operations are straight-forward, e.g., > gr = GRanges(paste0("chr", c(1:22, "X", "Y")), IRanges(1, 100)) > seqlevels(gr) = sub("chr", "ch", seqlevels(gr)) To get a more comprehensive example I should have followed my own advice and grabbed from the help page! ## Rename: seqlevels(txdb) <- sub("chr", "", seqlevels(txdb)) seqlevels(txdb) seqlevels(txdb) <- paste0("CH", seqlevels(txdb)) seqlevels(txdb) seqlevels(txdb)[seqlevels(__**__txdb) == "CHM"] <- "M" seqlevels(txdb) -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 <tel:%28206%29%20667-2793> <tel:%28206%29%20667-2793> -- Dr. Martin Morgan, PhD Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 [[alternative HTML version deleted]] _________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
-- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel