On Wed, May 22, 2013 at 11:15 AM, Hervé Pagès <hpa...@fhcrc.org> wrote:
> On 05/21/2013 06:37 PM, Michael Lawrence wrote: > >> It's clear we're not going to agree. I will volunteer to maintain these, >> OK? I will maintain them anyway for my own use. >> > > That's fine with me. I didn't mean we shouldn't have those wrappers (and > I think Martin or Val are going to come back here with a proposal). > I just wanted to make it clear that the seqlevels setter is not that > hideous thing > > > seqlevels(x, new2old = match("chr1", seqlevels(x)), force = TRUE) <- > "chr1" > > Sorry if I kept beating again and again the same dead horse... > > I think your email got truncated again but anyway no hard feelings here. I'm glad things are being resolved. Looking forward to seeing you guys in July. > Cheers, > H. > > >> Michael >> >> >> On Tue, May 21, 2013 at 2:34 PM, Hervé Pagès <hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org>> wrote: >> >> >> >> On 05/21/2013 08:00 AM, Michael Lawrence wrote: >> >> >> >> >> On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès <hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote: >> >> Michael, >> >> >> On 05/20/2013 08:44 PM, Michael Lawrence wrote: >> >> >> >> >> On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès >> <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>>> wrote: >> >> Hi, >> >> >> On 05/20/2013 03:15 PM, Michael Lawrence wrote: >> >> On Mon, May 20, 2013 at 3:11 PM, Martin Morgan >> <mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org>> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>>> wrote: >> >> Hi Michael -- >> >> >> On 5/20/2013 5:56 AM, Michael Lawrence >> wrote: >> >> While it's cool that seqlevels<- has >> become so >> flexible, >> I still claim >> that >> rename/keep/drop would be a lot easier >> to read, >> because >> they describe the >> high-level operation, and there's no >> reason to >> deprecate >> them. They're >> also >> easier to remember. For example, for >> dropping with >> seqlevels<-, the user >> needs >> to remember that "force" is necessary >> to drop. Much >> easier to just say >> "dropSeqlevels(), please". And >> reimplementing >> keepSeqlevels is still too >> complicated. Is it such a maintenance >> burden to >> have >> these simple >> wrappers sit >> on top of the low-level manipulators? >> >> Another suggestion: renameSeqlevels >> should not >> require a >> named vector (in >> cases >> where it is unnamed, require that it is >> parallel to the >> existing >> seqlevels, as >> with seqlevels<-). >> >> >> I didn't really indicate what drove my >> desire to see >> keepSeqlevels >> deprecated. keepSeqlevels, seqlevels<-, and >> isActiveSeq were >> developed more >> or less independently, and have different >> contracts. The >> different >> contracts are confusing to the user, as is >> creating >> a usable >> help system >> when there are multiple end points for >> what is a common >> operation. The help >> pages of each were inadequate in different >> ways. >> And because >> the code was >> developed independently, support for >> different >> objects was >> inconsistent. So >> actually, yes, the maintenance (and use) >> burden for the >> previous state of >> affairs was substantial. >> >> On the other hand, I agree that >> keepSeqlevels is >> convenient >> as a simple >> wrapper around seqlevels<-, in the same >> way that >> setNames >> and names<- are >> both useful. >> >> So we could iterate to >> >> keepSeqlevels <- >> function(x, value, ...) >> { >> seqlevels(x, force=TRUE) <- value >> x >> } >> >> but I'd be less enthusiastic about >> maintaining the >> original >> contract of >> keepSeqlevels, where keepSeqlevels(gr1, >> gr2), would >> have >> worked for two >> GRanges objects. >> >> >> Why would this be called keepSeqlevels() if, >> depending on >> how it's used, >> it will either add, drop, rename, or permute the >> seqlevels? >> Couldn't this be called setSeqlevels? >> >> >> I thought that new2old was necessary for permuting. >> >> >> The seqlevels setter has no 'new2old' arg. >> >> >> You're right that >> adding should be disallowed for keepSeqlevels(). Adding >> seqlevels is not >> a common operation. The two common operations are: >> * Permuting, usually because the data were imported in >> non-canonical >> order (seqnameStyle was designed to address this, no?). >> >> >> Permuting is straightforward with seqlevels<-: >> >> > seqlevels(gr) >> [1] "chr1" "chr2" "chr3" >> >> > seqlevels(gr) <- rev(seqlevels(gr)) >> >> > seqlevels(gr) >> [1] "chr3" "chr2" "chr1" >> >> >> * Subsetting, either by keeping or dropping. >> >> >> Also straightforward: >> >> > seqlevels(gr, force=TRUE) <- "chr2" >> >> > seqlevels(gr) >> [1] "chr2" >> >> >> Two main reasons: taking a >> small slice of the data for prototyping, or removing >> problematic >> chromosomes (sex, circular). This goes back to bsapply >> and the >> exclude >> argument. >> >> >> There are other important use cases: >> >> - Dropping seqlevels that are *not* in use. This happens >> for example >> when subsetting a BAM file with Rsamtools::filterBam >> to keep only >> the alignments located on a given chromosome. The >> entire sequence >> dictionary (in the header) is propagated to the >> resulting BAM file >> so when you read the file with readGAlignments() you >> end up with a >> bunch of useless seqlevels that you generally start by >> dropping. >> You don't want to drop any alignment so force=FALSE is >> your friend. >> >> >> This is sort of tangential to the discussion, but do you really >> want to >> do this? I would preserve the universe as given by the BAM. >> >> - An even more common operation is renaming: 90% of the >> times that's >> what I use seqlevels<- for, and that's what I tell >> people to use >> when they need to rename their chromosomes. Also >> straightforward: >> >> > seqlevels(gr) >> [1] "chr1" "chr2" "chr3" "chrM" >> >> > seqlevels(gr) <- sub("^chr", "", seqlevels(gr)) >> > seqlevels(gr) >> [1] "1" "2" "3" "M" >> >> > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT" >> > seqlevels(gr) >> [1] "1" "2" "3" "MT" >> >> Note that this is simpler to use than >> renameSeqlevels() which >> always required you to build the entire right value as >> a named >> vector. >> >> >> Sure, renaming is a common use case. Not sure how I forgot that. >> >> As I wrote earlier in the thread, renameSeqlevels should be >> changed so >> as not to require naming of the vector. >> >> So the added-value of keepSeqlevels() seems to boil down to: >> >> (a) it always uses force=TRUE >> >> (b) it preserves the original object and returns the >> modified object >> >> If you want to restrict the use of keepSeqlevels() to >> permuting and >> dropping, you'll also have to disallow renaming (in >> addition to adding). >> Then its name will more or less reflect what it does (if >> "keep" means >> "permute" or "drop"). The final result will be something >> that does less >> than setSeqlevels() and that is not necessarily easier to >> read: both >> will set on the object whatever vector of seqlevels they >> are supplied, >> except that one will fail when doing so would actually >> result in adding >> or renaming seqlevels. >> >> >> So it looks like seqlevels<- is pretty powerful. Now I see that >> if I >> scroll down to the examples in the man page, I find the actual >> documentation. It's cool to learn that we can replace seqlevels >> on TxDb >> objects. >> >> >> Has been working on TranscriptDb objects for more than >> >> >> My argument has always been that since these low-level >> operations are so powerful, it's nice to have high-level >> operations to >> clarify the code. We have to think hard to know what the RHS >> will do in >> that replacement. >> >> >> When you do something like seqlevels(gr) <- new_seqlevels, you're >> saying >> "can you please replace the current seqlevels by those new ones". >> That's all. The request/contract is simple and all the complexity is >> taken care of internally by seqlevels<-. It might be that, depending >> on >> what's currently on 'gr', this will actually result in dropping, >> adding, >> permuting. Or *a combination of the 3*. Like here: >> >> >> > seqlevels(gr) >> [1] "chr1" "chr2" "chr3" >> >> > seqlevels(gr) <- c("chr4", "chr3", "chr1") >> > seqlevels(gr) >> [1] "chr4" "chr3" "chr1" >> >> # one seqlevel was drop, one was added, and the 2 seqlevels that >> were >> # preserved were reordered. >> >> When you want to subset a vector to keep all elements with an even >> index you do x[c(FALSE, TRUE)]. Is that keeping or dropping elements? >> Do we need to have a subsetting operator for keeping elements, another >> one for dropping elements, and another one for permuting them (like >> in x[sample(length(x))])? >> >> H. >> >> It's probably one of the most complex replacement >> operations; far more complex than the typical one, including >> levels<- on >> factor. There's nothing wrong with that; it's just complexity >> that we >> should be able to hide. >> >> Michael >> >> H. >> >> >> Michael >> >> >> H. >> >> >> >> Well, I wasn't even aware of that feature, so >> you've >> made your >> point about >> the documentation ;) Sounds like a good >> compromise to me. >> >> Thanks for understanding, >> Michael >> >> >> >> Martin >> >> >> Michael >> >> >> >> >> >> >> On Sat, May 18, 2013 at 6:00 PM, >> Martin Morgan >> <mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org>> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>> >> <mailto:mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org >> >>>>**> >> >> >> wrote: >> >> On 05/18/2013 05:42 PM, Martin >> Morgan wrote: >> >> Some of the most common >> operations are >> straight-forward, e.g., >> >> > gr = >> GRanges(paste0("chr", >> c(1:22, >> "X", "Y")), IRanges(1, >> 100)) >> > seqlevels(gr) = >> sub("chr", "ch", >> seqlevels(gr)) >> >> >> To get a more comprehensive >> example I >> should have >> followed my own >> advice and >> grabbed from the help page! >> >> ## Rename: >> seqlevels(txdb) <- >> sub("chr", "", >> seqlevels(txdb)) >> seqlevels(txdb) >> >> seqlevels(txdb) <- >> paste0("CH", >> seqlevels(txdb)) >> seqlevels(txdb) >> >> >> seqlevels(txdb)[seqlevels(__****______txdb) == >> >> >> >> "CHM"] <- "M" >> >> >> seqlevels(txdb) >> >> >> -- >> Computational Biology / Fred >> Hutchinson >> Cancer >> Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 >> <tel:%28206%29%20667-2793> >> <tel:%28206%29%20667-2793> <tel:%28206%29%20667-2793> >> <tel:%28206%29%20667-2793> >> >> >> >> >> -- >> Dr. Martin Morgan, PhD >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> >> [[alternative HTML version deleted]] >> >> >> ______________________________**_______________________ >> >> Bioc-devel@r-project.org >> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org> >> > >> <mailto:Bioc-devel@r-project._**_org >> <mailto:Bioc-devel@r-project.**org <Bioc-devel@r-project.org>>> >> <mailto:Bioc-devel@r-project. >> <mailto:Bioc-devel@r-project.>**____org >> <mailto:Bioc-devel@r-project._**_org >> <mailto:Bioc-devel@r-project.**org <Bioc-devel@r-project.org>>>> >> mailing list >> >> https://stat.ethz.ch/mailman/_**_____listinfo/bioc-devel<https://stat.ethz.ch/mailman/______listinfo/bioc-devel> >> >> <https://stat.ethz.ch/mailman/**____listinfo/bioc-devel<https://stat.ethz.ch/mailman/____listinfo/bioc-devel> >> > >> >> >> <https://stat.ethz.ch/mailman/**____listinfo/bioc-devel<https://stat.ethz.ch/mailman/____listinfo/bioc-devel> >> >> <https://stat.ethz.ch/mailman/**__listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> >> >> >> >> >> >> <https://stat.ethz.ch/mailman/**____listinfo/bioc-devel<https://stat.ethz.ch/mailman/____listinfo/bioc-devel> >> >> <https://stat.ethz.ch/mailman/**__listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> > >> >> <https://stat.ethz.ch/mailman/**__listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> >> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> >>> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> >> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> <tel:%28206%29%20667-5791> >> <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> <tel:%28206%29%20667-1319> >> <tel:%28206%29%20667-1319> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> <tel:%28206%29%20667-1319> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel