It's clear we're not going to agree. I will volunteer to maintain these, OK? I will maintain them anyway for my own use.
Michael On Tue, May 21, 2013 at 2:34 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > > > On 05/21/2013 08:00 AM, Michael Lawrence wrote: > >> >> >> >> On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès <hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org>> wrote: >> >> Michael, >> >> >> On 05/20/2013 08:44 PM, Michael Lawrence wrote: >> >> >> >> >> On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès <hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote: >> >> Hi, >> >> >> On 05/20/2013 03:15 PM, Michael Lawrence wrote: >> >> On Mon, May 20, 2013 at 3:11 PM, Martin Morgan >> <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>> wrote: >> >> Hi Michael -- >> >> >> On 5/20/2013 5:56 AM, Michael Lawrence wrote: >> >> While it's cool that seqlevels<- has become so >> flexible, >> I still claim >> that >> rename/keep/drop would be a lot easier to read, >> because >> they describe the >> high-level operation, and there's no reason to >> deprecate >> them. They're >> also >> easier to remember. For example, for dropping >> with >> seqlevels<-, the user >> needs >> to remember that "force" is necessary to drop. >> Much >> easier to just say >> "dropSeqlevels(), please". And reimplementing >> keepSeqlevels is still too >> complicated. Is it such a maintenance burden to >> have >> these simple >> wrappers sit >> on top of the low-level manipulators? >> >> Another suggestion: renameSeqlevels should not >> require a >> named vector (in >> cases >> where it is unnamed, require that it is >> parallel to the >> existing >> seqlevels, as >> with seqlevels<-). >> >> >> I didn't really indicate what drove my desire to see >> keepSeqlevels >> deprecated. keepSeqlevels, seqlevels<-, and >> isActiveSeq were >> developed more >> or less independently, and have different >> contracts. The >> different >> contracts are confusing to the user, as is creating >> a usable >> help system >> when there are multiple end points for what is a >> common >> operation. The help >> pages of each were inadequate in different ways. >> And because >> the code was >> developed independently, support for different >> objects was >> inconsistent. So >> actually, yes, the maintenance (and use) burden for >> the >> previous state of >> affairs was substantial. >> >> On the other hand, I agree that keepSeqlevels is >> convenient >> as a simple >> wrapper around seqlevels<-, in the same way that >> setNames >> and names<- are >> both useful. >> >> So we could iterate to >> >> keepSeqlevels <- >> function(x, value, ...) >> { >> seqlevels(x, force=TRUE) <- value >> x >> } >> >> but I'd be less enthusiastic about maintaining the >> original >> contract of >> keepSeqlevels, where keepSeqlevels(gr1, gr2), would >> have >> worked for two >> GRanges objects. >> >> >> Why would this be called keepSeqlevels() if, depending on >> how it's used, >> it will either add, drop, rename, or permute the seqlevels? >> Couldn't this be called setSeqlevels? >> >> >> I thought that new2old was necessary for permuting. >> >> >> The seqlevels setter has no 'new2old' arg. >> >> >> You're right that >> adding should be disallowed for keepSeqlevels(). Adding >> seqlevels is not >> a common operation. The two common operations are: >> * Permuting, usually because the data were imported in >> non-canonical >> order (seqnameStyle was designed to address this, no?). >> >> >> Permuting is straightforward with seqlevels<-: >> >> > seqlevels(gr) >> [1] "chr1" "chr2" "chr3" >> >> > seqlevels(gr) <- rev(seqlevels(gr)) >> >> > seqlevels(gr) >> [1] "chr3" "chr2" "chr1" >> >> >> * Subsetting, either by keeping or dropping. >> >> >> Also straightforward: >> >> > seqlevels(gr, force=TRUE) <- "chr2" >> >> > seqlevels(gr) >> [1] "chr2" >> >> >> Two main reasons: taking a >> small slice of the data for prototyping, or removing problematic >> chromosomes (sex, circular). This goes back to bsapply and the >> exclude >> argument. >> >> >> There are other important use cases: >> >> - Dropping seqlevels that are *not* in use. This happens for >> example >> when subsetting a BAM file with Rsamtools::filterBam to keep only >> the alignments located on a given chromosome. The entire sequence >> dictionary (in the header) is propagated to the resulting BAM >> file >> so when you read the file with readGAlignments() you end up with >> a >> bunch of useless seqlevels that you generally start by dropping. >> You don't want to drop any alignment so force=FALSE is your >> friend. >> >> >> This is sort of tangential to the discussion, but do you really want to >> do this? I would preserve the universe as given by the BAM. >> >> - An even more common operation is renaming: 90% of the times >> that's >> what I use seqlevels<- for, and that's what I tell people to use >> when they need to rename their chromosomes. Also straightforward: >> >> > seqlevels(gr) >> [1] "chr1" "chr2" "chr3" "chrM" >> >> > seqlevels(gr) <- sub("^chr", "", seqlevels(gr)) >> > seqlevels(gr) >> [1] "1" "2" "3" "M" >> >> > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT" >> > seqlevels(gr) >> [1] "1" "2" "3" "MT" >> >> Note that this is simpler to use than renameSeqlevels() which >> always required you to build the entire right value as a named >> vector. >> >> >> Sure, renaming is a common use case. Not sure how I forgot that. >> >> As I wrote earlier in the thread, renameSeqlevels should be changed so >> as not to require naming of the vector. >> >> So the added-value of keepSeqlevels() seems to boil down to: >> >> (a) it always uses force=TRUE >> >> (b) it preserves the original object and returns the modified >> object >> >> If you want to restrict the use of keepSeqlevels() to permuting and >> dropping, you'll also have to disallow renaming (in addition to >> adding). >> Then its name will more or less reflect what it does (if "keep" means >> "permute" or "drop"). The final result will be something that does >> less >> than setSeqlevels() and that is not necessarily easier to read: both >> will set on the object whatever vector of seqlevels they are supplied, >> except that one will fail when doing so would actually result in >> adding >> or renaming seqlevels. >> >> >> So it looks like seqlevels<- is pretty powerful. Now I see that if I >> scroll down to the examples in the man page, I find the actual >> documentation. It's cool to learn that we can replace seqlevels on TxDb >> objects. >> > > Has been working on TranscriptDb objects for more than > > > My argument has always been that since these low-level >> operations are so powerful, it's nice to have high-level operations to >> clarify the code. We have to think hard to know what the RHS will do in >> that replacement. >> > > When you do something like seqlevels(gr) <- new_seqlevels, you're saying > "can you please replace the current seqlevels by those new ones". > That's all. The request/contract is simple and all the complexity is > taken care of internally by seqlevels<-. It might be that, depending on > what's currently on 'gr', this will actually result in dropping, adding, > permuting. Or *a combination of the 3*. Like here: > > > > seqlevels(gr) > [1] "chr1" "chr2" "chr3" > > > seqlevels(gr) <- c("chr4", "chr3", "chr1") > > seqlevels(gr) > [1] "chr4" "chr3" "chr1" > > # one seqlevel was drop, one was added, and the 2 seqlevels that were > # preserved were reordered. > > When you want to subset a vector to keep all elements with an even > index you do x[c(FALSE, TRUE)]. Is that keeping or dropping elements? > Do we need to have a subsetting operator for keeping elements, another > one for dropping elements, and another one for permuting them (like > in x[sample(length(x))])? > > H. > > It's probably one of the most complex replacement >> operations; far more complex than the typical one, including levels<- on >> factor. There's nothing wrong with that; it's just complexity that we >> should be able to hide. >> >> Michael >> >> H. >> >> >> Michael >> >> >> H. >> >> >> >> Well, I wasn't even aware of that feature, so you've >> made your >> point about >> the documentation ;) Sounds like a good compromise to me. >> >> Thanks for understanding, >> Michael >> >> >> >> Martin >> >> >> Michael >> >> >> >> >> >> >> On Sat, May 18, 2013 at 6:00 PM, Martin Morgan >> <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> >> <mailto:mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org>>>> >> >> wrote: >> >> On 05/18/2013 05:42 PM, Martin Morgan >> wrote: >> >> Some of the most common operations are >> straight-forward, e.g., >> >> > gr = GRanges(paste0("chr", >> c(1:22, >> "X", "Y")), IRanges(1, >> 100)) >> > seqlevels(gr) = sub("chr", >> "ch", >> seqlevels(gr)) >> >> >> To get a more comprehensive example I >> should have >> followed my own >> advice and >> grabbed from the help page! >> >> ## Rename: >> seqlevels(txdb) <- sub("chr", "", >> seqlevels(txdb)) >> seqlevels(txdb) >> >> seqlevels(txdb) <- paste0("CH", >> seqlevels(txdb)) >> seqlevels(txdb) >> >> >> seqlevels(txdb)[seqlevels(__****____txdb) == >> >> >> "CHM"] <- "M" >> >> >> seqlevels(txdb) >> >> >> -- >> Computational Biology / Fred Hutchinson >> Cancer >> Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> <tel:%28206%29%20667-2793> <tel:%28206%29%20667-2793> >> <tel:%28206%29%20667-2793> >> >> >> >> >> -- >> Dr. Martin Morgan, PhD >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> >> [[alternative HTML version deleted]] >> >> ______________________________**_____________________ >> Bioc-devel@r-project.org >> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org> >> > >> <mailto:Bioc-devel@r-project._**_org >> <mailto:Bioc-devel@r-project.**org <Bioc-devel@r-project.org>>> >> mailing list >> >> https://stat.ethz.ch/mailman/_**___listinfo/bioc-devel<https://stat.ethz.ch/mailman/____listinfo/bioc-devel> >> >> <https://stat.ethz.ch/mailman/**__listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> > >> >> >> >> <https://stat.ethz.ch/mailman/**__listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> >> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> >> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> <tel:%28206%29%20667-1319> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel