I am replying to this as Michael mentions that this is a powerful operation. Here's my problem that I believe is related, but I do not see a straightforward solution
> bhmm19uni GRanges with 559456 ranges and 4 metadata columns: seqnames ranges strand | name score <Rle> <IRanges> <Rle> | <character> <numeric> 1 chr1 [ 67229025, 67341224] * | 13_Heterochrom/lo 0 2 chr1 [203057955, 203060154] * | 12_Repressed 0 3 chr1 [ 8458427, 8486226] * | 13_Heterochrom/lo 0 4 chr1 [ 16904427, 16904826] * | 5_Strong_Enhancer 0 5 chr1 [ 25289427, 25293426] * | 11_Weak_Txn 0 ... ... ... ... ... ... ... 570766 chr22 [51100931, 51101330] * | 2_Weak_Promoter 0 570767 chr22 [51101331, 51101530] * | 6_Weak_Enhancer 0 570768 chr22 [51101531, 51101730] * | 2_Weak_Promoter 0 570769 chr22 [51101731, 51178130] * | 13_Heterochrom/lo 0 570770 chr22 [51178131, 51178330] * | 15_Repetitive/CNV 0 thick numcode <IRanges> <character> 1 [ 67001613, 67113812] 13 2 [201324578, 201326777] 12 3 [ 8381014, 8408813] 13 4 [ 16777014, 16777413] 5 5 [ 25162014, 25166013] 11 ... ... ... 570766 [49447797, 49448196] 2 570767 [49448197, 49448396] 6 570768 [49448397, 49448596] 2 570769 [49448597, 49524996] 13 570770 [49524997, 49525196] 15 --- seqlengths: chr1 chr10 chr11 chr5 ... chr2 chr21 chr4 249250621 135534747 135006516 180915260 ... 243199373 48129895 191154276 If I try to make a karyogram with ggbio, the chromosomes are in an unnatural order. What is the right approach to getting the seqinfo in a natural order with respect to chromosome names and lengths? Reassigning seqlevels seems dangerous but I haven't experimented fully. It must be a common use case but I do not see it in doc. On Tue, May 21, 2013 at 11:00 AM, Michael Lawrence < lawrence.mich...@gene.com> wrote: > On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > > > Michael, > > > > > > On 05/20/2013 08:44 PM, Michael Lawrence wrote: > > > >> > >> > >> > >> On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès <hpa...@fhcrc.org > >> <mailto:hpa...@fhcrc.org>> wrote: > >> > >> Hi, > >> > >> > >> On 05/20/2013 03:15 PM, Michael Lawrence wrote: > >> > >> On Mon, May 20, 2013 at 3:11 PM, Martin Morgan > >> <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> wrote: > >> > >> Hi Michael -- > >> > >> > >> On 5/20/2013 5:56 AM, Michael Lawrence wrote: > >> > >> While it's cool that seqlevels<- has become so flexible, > >> I still claim > >> that > >> rename/keep/drop would be a lot easier to read, because > >> they describe the > >> high-level operation, and there's no reason to deprecate > >> them. They're > >> also > >> easier to remember. For example, for dropping with > >> seqlevels<-, the user > >> needs > >> to remember that "force" is necessary to drop. Much > >> easier to just say > >> "dropSeqlevels(), please". And reimplementing > >> keepSeqlevels is still too > >> complicated. Is it such a maintenance burden to have > >> these simple > >> wrappers sit > >> on top of the low-level manipulators? > >> > >> Another suggestion: renameSeqlevels should not require a > >> named vector (in > >> cases > >> where it is unnamed, require that it is parallel to the > >> existing > >> seqlevels, as > >> with seqlevels<-). > >> > >> > >> I didn't really indicate what drove my desire to see > >> keepSeqlevels > >> deprecated. keepSeqlevels, seqlevels<-, and isActiveSeq were > >> developed more > >> or less independently, and have different contracts. The > >> different > >> contracts are confusing to the user, as is creating a usable > >> help system > >> when there are multiple end points for what is a common > >> operation. The help > >> pages of each were inadequate in different ways. And because > >> the code was > >> developed independently, support for different objects was > >> inconsistent. So > >> actually, yes, the maintenance (and use) burden for the > >> previous state of > >> affairs was substantial. > >> > >> On the other hand, I agree that keepSeqlevels is convenient > >> as a simple > >> wrapper around seqlevels<-, in the same way that setNames > >> and names<- are > >> both useful. > >> > >> So we could iterate to > >> > >> keepSeqlevels <- > >> function(x, value, ...) > >> { > >> seqlevels(x, force=TRUE) <- value > >> x > >> } > >> > >> but I'd be less enthusiastic about maintaining the original > >> contract of > >> keepSeqlevels, where keepSeqlevels(gr1, gr2), would have > >> worked for two > >> GRanges objects. > >> > >> > >> Why would this be called keepSeqlevels() if, depending on how it's > >> used, > >> it will either add, drop, rename, or permute the seqlevels? > >> Couldn't this be called setSeqlevels? > >> > >> > >> I thought that new2old was necessary for permuting. > >> > > > > The seqlevels setter has no 'new2old' arg. > > > > > > You're right that > >> adding should be disallowed for keepSeqlevels(). Adding seqlevels is not > >> a common operation. The two common operations are: > >> * Permuting, usually because the data were imported in non-canonical > >> order (seqnameStyle was designed to address this, no?). > >> > > > > Permuting is straightforward with seqlevels<-: > > > > > seqlevels(gr) > > [1] "chr1" "chr2" "chr3" > > > > > seqlevels(gr) <- rev(seqlevels(gr)) > > > > > seqlevels(gr) > > [1] "chr3" "chr2" "chr1" > > > > > > * Subsetting, either by keeping or dropping. > >> > > > > Also straightforward: > > > > > seqlevels(gr, force=TRUE) <- "chr2" > > > > > seqlevels(gr) > > [1] "chr2" > > > > > > Two main reasons: taking a > >> small slice of the data for prototyping, or removing problematic > >> chromosomes (sex, circular). This goes back to bsapply and the exclude > >> argument. > >> > > > > There are other important use cases: > > > > - Dropping seqlevels that are *not* in use. This happens for example > > when subsetting a BAM file with Rsamtools::filterBam to keep only > > the alignments located on a given chromosome. The entire sequence > > dictionary (in the header) is propagated to the resulting BAM file > > so when you read the file with readGAlignments() you end up with a > > bunch of useless seqlevels that you generally start by dropping. > > You don't want to drop any alignment so force=FALSE is your friend. > > > > > This is sort of tangential to the discussion, but do you really want to do > this? I would preserve the universe as given by the BAM. > > > > - An even more common operation is renaming: 90% of the times that's > > what I use seqlevels<- for, and that's what I tell people to use > > when they need to rename their chromosomes. Also straightforward: > > > > > seqlevels(gr) > > [1] "chr1" "chr2" "chr3" "chrM" > > > > > seqlevels(gr) <- sub("^chr", "", seqlevels(gr)) > > > seqlevels(gr) > > [1] "1" "2" "3" "M" > > > > > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT" > > > seqlevels(gr) > > [1] "1" "2" "3" "MT" > > > > Note that this is simpler to use than renameSeqlevels() which > > always required you to build the entire right value as a named > > vector. > > > > > Sure, renaming is a common use case. Not sure how I forgot that. > > As I wrote earlier in the thread, renameSeqlevels should be changed so as > not to require naming of the vector. > > > > So the added-value of keepSeqlevels() seems to boil down to: > > > > (a) it always uses force=TRUE > > > > (b) it preserves the original object and returns the modified object > > > > If you want to restrict the use of keepSeqlevels() to permuting and > > dropping, you'll also have to disallow renaming (in addition to adding). > > Then its name will more or less reflect what it does (if "keep" means > > "permute" or "drop"). The final result will be something that does less > > than setSeqlevels() and that is not necessarily easier to read: both > > will set on the object whatever vector of seqlevels they are supplied, > > except that one will fail when doing so would actually result in adding > > or renaming seqlevels. > > > > > So it looks like seqlevels<- is pretty powerful. Now I see that if I scroll > down to the examples in the man page, I find the actual documentation. It's > cool to learn that we can replace seqlevels on TxDb objects. My argument > has always been that since these low-level operations are so powerful, it's > nice to have high-level operations to clarify the code. We have to think > hard to know what the RHS will do in that replacement. It's probably one of > the most complex replacement operations; far more complex than the typical > one, including levels<- on factor. There's nothing wrong with that; it's > just complexity that we should be able to hide. > > Michael > > H. > > > > > >> Michael > >> > >> > >> H. > >> > >> > >> > >> Well, I wasn't even aware of that feature, so you've made your > >> point about > >> the documentation ;) Sounds like a good compromise to me. > >> > >> Thanks for understanding, > >> Michael > >> > >> > >> > >> Martin > >> > >> > >> Michael > >> > >> > >> > >> > >> > >> > >> On Sat, May 18, 2013 at 6:00 PM, Martin Morgan > >> <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> > >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org > >>> > >> > >> wrote: > >> > >> On 05/18/2013 05:42 PM, Martin Morgan wrote: > >> > >> Some of the most common operations are > >> straight-forward, e.g., > >> > >> > gr = GRanges(paste0("chr", c(1:22, > >> "X", "Y")), IRanges(1, > >> 100)) > >> > seqlevels(gr) = sub("chr", "ch", > >> seqlevels(gr)) > >> > >> > >> To get a more comprehensive example I should have > >> followed my own > >> advice and > >> grabbed from the help page! > >> > >> ## Rename: > >> seqlevels(txdb) <- sub("chr", "", > >> seqlevels(txdb)) > >> seqlevels(txdb) > >> > >> seqlevels(txdb) <- paste0("CH", > >> seqlevels(txdb)) > >> seqlevels(txdb) > >> > >> seqlevels(txdb)[seqlevels(__****__txdb) == > >> > >> "CHM"] <- "M" > >> > >> > >> seqlevels(txdb) > >> > >> > >> -- > >> Computational Biology / Fred Hutchinson Cancer > >> Research Center > >> 1100 Fairview Ave. N. > >> PO Box 19024 Seattle, WA 98109 > >> > >> Location: Arnold Building M1 B861 > >> Phone: (206) 667-2793 <tel:%28206%29%20667-2793> > >> <tel:%28206%29%20667-2793> > >> > >> > >> > >> > >> -- > >> Dr. Martin Morgan, PhD > >> > >> Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N. > >> PO Box 19024 Seattle, WA 98109 > >> > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________**___________________ > >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.**org< > Bioc-devel@r-project.org> > >> > > >> mailing list > >> https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel< > https://stat.ethz.ch/mailman/__listinfo/bioc-devel> > >> > >> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel< > https://stat.ethz.ch/mailman/listinfo/bioc-devel> > >> > > >> > >> > >> -- > >> Hervé Pagès > >> > >> Program in Computational Biology > >> Division of Public Health Sciences > >> > >> Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N, M1-B514 > >> P.O. Box 19024 > >> Seattle, WA 98109-1024 > >> > >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> > >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > >> > >> > >> > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpa...@fhcrc.org > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel