On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès <hpa...@fhcrc.org> wrote:
> Michael, > > > On 05/20/2013 08:44 PM, Michael Lawrence wrote: > >> >> >> >> On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès <hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org>> wrote: >> >> Hi, >> >> >> On 05/20/2013 03:15 PM, Michael Lawrence wrote: >> >> On Mon, May 20, 2013 at 3:11 PM, Martin Morgan >> <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> wrote: >> >> Hi Michael -- >> >> >> On 5/20/2013 5:56 AM, Michael Lawrence wrote: >> >> While it's cool that seqlevels<- has become so flexible, >> I still claim >> that >> rename/keep/drop would be a lot easier to read, because >> they describe the >> high-level operation, and there's no reason to deprecate >> them. They're >> also >> easier to remember. For example, for dropping with >> seqlevels<-, the user >> needs >> to remember that "force" is necessary to drop. Much >> easier to just say >> "dropSeqlevels(), please". And reimplementing >> keepSeqlevels is still too >> complicated. Is it such a maintenance burden to have >> these simple >> wrappers sit >> on top of the low-level manipulators? >> >> Another suggestion: renameSeqlevels should not require a >> named vector (in >> cases >> where it is unnamed, require that it is parallel to the >> existing >> seqlevels, as >> with seqlevels<-). >> >> >> I didn't really indicate what drove my desire to see >> keepSeqlevels >> deprecated. keepSeqlevels, seqlevels<-, and isActiveSeq were >> developed more >> or less independently, and have different contracts. The >> different >> contracts are confusing to the user, as is creating a usable >> help system >> when there are multiple end points for what is a common >> operation. The help >> pages of each were inadequate in different ways. And because >> the code was >> developed independently, support for different objects was >> inconsistent. So >> actually, yes, the maintenance (and use) burden for the >> previous state of >> affairs was substantial. >> >> On the other hand, I agree that keepSeqlevels is convenient >> as a simple >> wrapper around seqlevels<-, in the same way that setNames >> and names<- are >> both useful. >> >> So we could iterate to >> >> keepSeqlevels <- >> function(x, value, ...) >> { >> seqlevels(x, force=TRUE) <- value >> x >> } >> >> but I'd be less enthusiastic about maintaining the original >> contract of >> keepSeqlevels, where keepSeqlevels(gr1, gr2), would have >> worked for two >> GRanges objects. >> >> >> Why would this be called keepSeqlevels() if, depending on how it's >> used, >> it will either add, drop, rename, or permute the seqlevels? >> Couldn't this be called setSeqlevels? >> >> >> I thought that new2old was necessary for permuting. >> > > The seqlevels setter has no 'new2old' arg. > > > You're right that >> adding should be disallowed for keepSeqlevels(). Adding seqlevels is not >> a common operation. The two common operations are: >> * Permuting, usually because the data were imported in non-canonical >> order (seqnameStyle was designed to address this, no?). >> > > Permuting is straightforward with seqlevels<-: > > > seqlevels(gr) > [1] "chr1" "chr2" "chr3" > > > seqlevels(gr) <- rev(seqlevels(gr)) > > > seqlevels(gr) > [1] "chr3" "chr2" "chr1" > > > * Subsetting, either by keeping or dropping. >> > > Also straightforward: > > > seqlevels(gr, force=TRUE) <- "chr2" > > > seqlevels(gr) > [1] "chr2" > > > Two main reasons: taking a >> small slice of the data for prototyping, or removing problematic >> chromosomes (sex, circular). This goes back to bsapply and the exclude >> argument. >> > > There are other important use cases: > > - Dropping seqlevels that are *not* in use. This happens for example > when subsetting a BAM file with Rsamtools::filterBam to keep only > the alignments located on a given chromosome. The entire sequence > dictionary (in the header) is propagated to the resulting BAM file > so when you read the file with readGAlignments() you end up with a > bunch of useless seqlevels that you generally start by dropping. > You don't want to drop any alignment so force=FALSE is your friend. > > This is sort of tangential to the discussion, but do you really want to do this? I would preserve the universe as given by the BAM. > - An even more common operation is renaming: 90% of the times that's > what I use seqlevels<- for, and that's what I tell people to use > when they need to rename their chromosomes. Also straightforward: > > > seqlevels(gr) > [1] "chr1" "chr2" "chr3" "chrM" > > > seqlevels(gr) <- sub("^chr", "", seqlevels(gr)) > > seqlevels(gr) > [1] "1" "2" "3" "M" > > > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT" > > seqlevels(gr) > [1] "1" "2" "3" "MT" > > Note that this is simpler to use than renameSeqlevels() which > always required you to build the entire right value as a named > vector. > > Sure, renaming is a common use case. Not sure how I forgot that. As I wrote earlier in the thread, renameSeqlevels should be changed so as not to require naming of the vector. > So the added-value of keepSeqlevels() seems to boil down to: > > (a) it always uses force=TRUE > > (b) it preserves the original object and returns the modified object > > If you want to restrict the use of keepSeqlevels() to permuting and > dropping, you'll also have to disallow renaming (in addition to adding). > Then its name will more or less reflect what it does (if "keep" means > "permute" or "drop"). The final result will be something that does less > than setSeqlevels() and that is not necessarily easier to read: both > will set on the object whatever vector of seqlevels they are supplied, > except that one will fail when doing so would actually result in adding > or renaming seqlevels. > > So it looks like seqlevels<- is pretty powerful. Now I see that if I scroll down to the examples in the man page, I find the actual documentation. It's cool to learn that we can replace seqlevels on TxDb objects. My argument has always been that since these low-level operations are so powerful, it's nice to have high-level operations to clarify the code. We have to think hard to know what the RHS will do in that replacement. It's probably one of the most complex replacement operations; far more complex than the typical one, including levels<- on factor. There's nothing wrong with that; it's just complexity that we should be able to hide. Michael H. > > >> Michael >> >> >> H. >> >> >> >> Well, I wasn't even aware of that feature, so you've made your >> point about >> the documentation ;) Sounds like a good compromise to me. >> >> Thanks for understanding, >> Michael >> >> >> >> Martin >> >> >> Michael >> >> >> >> >> >> >> On Sat, May 18, 2013 at 6:00 PM, Martin Morgan >> <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>> >> >> wrote: >> >> On 05/18/2013 05:42 PM, Martin Morgan wrote: >> >> Some of the most common operations are >> straight-forward, e.g., >> >> > gr = GRanges(paste0("chr", c(1:22, >> "X", "Y")), IRanges(1, >> 100)) >> > seqlevels(gr) = sub("chr", "ch", >> seqlevels(gr)) >> >> >> To get a more comprehensive example I should have >> followed my own >> advice and >> grabbed from the help page! >> >> ## Rename: >> seqlevels(txdb) <- sub("chr", "", >> seqlevels(txdb)) >> seqlevels(txdb) >> >> seqlevels(txdb) <- paste0("CH", >> seqlevels(txdb)) >> seqlevels(txdb) >> >> seqlevels(txdb)[seqlevels(__****__txdb) == >> >> "CHM"] <- "M" >> >> >> seqlevels(txdb) >> >> >> -- >> Computational Biology / Fred Hutchinson Cancer >> Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 <tel:%28206%29%20667-2793> >> <tel:%28206%29%20667-2793> >> >> >> >> >> -- >> Dr. Martin Morgan, PhD >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> >> [[alternative HTML version deleted]] >> >> ______________________________**___________________ >> Bioc-devel@r-project.org >> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org> >> > >> mailing list >> >> https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> >> >> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> > >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel