Nice. I thought you had made something like this. What would be extremely useful is a function that just takes a GRanges and return a fixed version - less typing, and easier to understand what goes on.
Best, Kasper On Tue, Sep 17, 2013 at 4:22 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: > OK I just made sortSeqlevels() a generic with a method for character > vectors and a default method. So you can do sortSeqlevels() directly on > a Seqinfo object, a GRanges object and anything that contains a Seqinfo > component: > > > gr > GRanges with 0 ranges and 0 metadata columns: > seqnames ranges strand > <Rle> <IRanges> <Rle> > --- > seqlengths: > chr2RHet chr4 chrUextra chrYHet ... chr3R chr2R > chrX > NA NA NA NA ... NA NA > NA > > > sortSeqlevels(gr) > GRanges with 0 ranges and 0 metadata columns: > seqnames ranges strand > <Rle> <IRanges> <Rle> > --- > seqlengths: > chr2R chr3L chr3R chr4 ... chrXHet chrYHet > chrUextra > NA NA NA NA ... NA NA > NA > > H. > > > On 09/17/2013 01:05 PM, Michael Lawrence wrote: > >> Thanks, Hervé, that's great. >> >> Would it make sense to have a sort or sortSeqlevels method for Seqinfo >> and then a sortSeqlevels,ANY method that just does seqinfo(x) <- >> sort(seqinfo(x))? >> >> Michael >> >> >> >> On Tue, Sep 17, 2013 at 12:56 PM, Hervé Pagès <hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org>> wrote: >> >> On 09/17/2013 10:58 AM, Hervé Pagès wrote: >> >> For sorting the chromosome names in "natural" order, you can use >> the makeSeqnameIds() helper function in GenomicRanges: >> >> seqlevels(gr) <- seqlevels(gr)[makeSeqnameIds(_** >> _seqlevels(gr))] >> >> >> I meant: >> >> seqlevels(gr) <- >> seqlevels(gr)[order(__**makeSeqnameIds(seqlevels(gr)))**__] >> >> >> Probably the source of Michael's confusion... sorry. >> >> sortSeqlevels() just added to GenomicRanges 1.13.44: >> >> seqlevels <- c("chr2RHet", "chr4", "chrUextra", "chrYHet", >> "chrM", "chrXHet", "chr2LHet", "chrU", >> "chr3L", "chr3R", "chr2R", "chrX") >> >> Then: >> >> > sortSeqlevels(seqlevels) >> [1] "chr2R" "chr3L" "chr3R" "chr4" "chrX" "chrU" >> [7] "chrM" "chr2LHet" "chr2RHet" "chrXHet" "chrYHet" >> "chrUextra" >> >> > sortSeqlevels(seqlevels, X.is.sexchrom=TRUE) >> [1] "chr2R" "chr3L" "chr3R" "chr4" "chrX" "chrU" >> [7] "chrM" "chr2LHet" "chr2RHet" "chrXHet" "chrYHet" >> "chrUextra" >> >> H. >> >> >> >> >> It recognizes several naming conventions (chr-prefixed or not, >> roman, etc...) e.g.: >> >> > makeSeqnameIds(c("Y", "1", "9", "M", "2")) >> [1] 4 1 3 5 2 >> >> > makeSeqnameIds(c("chrY", "chrI", "chrIX", "chrM", "chrII")) >> [1] 4 1 3 5 2 >> >> I still need to improve the documentation of this function, put >> more examples, and add unit tests. It's used internally by the >> makeTranscriptDb* functions in GenomicFeatures to assign internal >> ids to the chromosomes so that all the TranscriptDb extractors >> can then return GRanges and GRangesList objects with the seqlevels >> in the "natural" order. >> >> Cheers, >> H. >> >> >> On 09/17/2013 10:23 AM, Kasper Daniel Hansen wrote: >> >> Actually, what I (and I think several others) just want is >> fixSeqnames() >> which sorts and fixes them to some convention. I don't >> really care which >> one, but I want to do some easy harmonization, preferably in >> line with >> other bioc tools. I know this is hard to do for all >> organisms, but >> having >> something that deals with the major ones (human, mouse, >> fruit fly, yeast, >> some monkeys) would be extremely valuable. If this function >> existed, I >> could just throw all my GRanges at it. >> >> Kasper >> >> >> On Tue, Sep 17, 2013 at 1:14 PM, Michael Lawrence >> <lawrence.mich...@gene.com <mailto:lawrence.michael@gene.** >> com <lawrence.mich...@gene.com>> >> >> >> wrote: >> >> >> When I hit this I usually just do: >> >> seqlevels(gr) <- paste0("chr", c(1:22, "X", "Y")) >> >> It would be nice if there were a function for sorting >> chromosome names >> according to a specified naming convention. Like >> sortSeqlevels(gr, >> UCSCNaming()) or the alternative replacement syntax: >> seqinfo(gr) <- >> sort(seqinfo(gr), UCSCNaming()). Although sort is only >> single-dispatch. >> >> Michael >> >> >> >> >> >> >> On Tue, Sep 17, 2013 at 8:59 AM, Vincent Carey >> <st...@channing.harvard.edu >> >> <mailto:stvjc@channing.**harvard.edu<st...@channing.harvard.edu> >> >>__wrote: >> >> >> I am replying to this as Michael mentions that this >> is a powerful >> operation. Here's my >> problem that I believe is related, but I do not see >> a straightforward >> solution >> >> bhmm19uni >> >> GRanges with 559456 ranges and 4 metadata columns: >> seqnames ranges strand >> | name >> score >> <Rle> <IRanges> <Rle> >> | <character> >> <numeric> >> 1 chr1 [ 67229025, 67341224] * >> | 13_Heterochrom/lo >> 0 >> 2 chr1 [203057955, 203060154] * >> | 12_Repressed >> 0 >> 3 chr1 [ 8458427, 8486226] * >> | 13_Heterochrom/lo >> 0 >> 4 chr1 [ 16904427, 16904826] * >> | 5_Strong_Enhancer >> 0 >> 5 chr1 [ 25289427, 25293426] * >> | 11_Weak_Txn >> 0 >> ... ... ... ... >> ... ... >> ... >> 570766 chr22 [51100931, 51101330] * >> | 2_Weak_Promoter >> 0 >> 570767 chr22 [51101331, 51101530] * >> | 6_Weak_Enhancer >> 0 >> 570768 chr22 [51101531, 51101730] * >> | 2_Weak_Promoter >> 0 >> 570769 chr22 [51101731, 51178130] * >> | 13_Heterochrom/lo >> 0 >> 570770 chr22 [51178131, 51178330] * >> | 15_Repetitive/CNV >> 0 >> thick numcode >> <IRanges> <character> >> 1 [ 67001613, 67113812] 13 >> 2 [201324578, 201326777] 12 >> 3 [ 8381014, 8408813] 13 >> 4 [ 16777014, 16777413] 5 >> 5 [ 25162014, 25166013] 11 >> ... ... ... >> 570766 [49447797, 49448196] 2 >> 570767 [49448197, 49448396] 6 >> 570768 [49448397, 49448596] 2 >> 570769 [49448597, 49524996] 13 >> 570770 [49524997, 49525196] 15 >> --- >> seqlengths: >> chr1 chr10 chr11 chr5 ... >> chr2 chr21 >> chr4 >> 249250621 135534747 135006516 180915260 ... >> 243199373 48129895 >> 191154276 >> >> If I try to make a karyogram with ggbio, the >> chromosomes are in an >> unnatural order. What is the right approach to >> getting the seqinfo >> in a >> natural order with respect to chromosome names and >> lengths? >> Reassigning >> seqlevels seems dangerous but I haven't experimented >> fully. It must >> be a >> common use case but I do not see it in doc. >> >> >> >> On Tue, May 21, 2013 at 11:00 AM, Michael Lawrence < >> lawrence.mich...@gene.com >> >> <mailto:lawrence.michael@gene.**com<lawrence.mich...@gene.com>>> >> wrote: >> >> On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès >> <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>> >> >> wrote: >> >> Michael, >> >> >> On 05/20/2013 08:44 PM, Michael Lawrence >> wrote: >> >> >> >> >> On Mon, May 20, 2013 at 4:13 PM, Hervé >> Pagès <hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org> >> <mailto:hpa...@fhcrc.org >> >> <mailto:hpa...@fhcrc.org>>> wrote: >> >> Hi, >> >> >> On 05/20/2013 03:15 PM, Michael >> Lawrence wrote: >> >> On Mon, May 20, 2013 at 3:11 >> PM, Martin Morgan >> <mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org> >> <mailto:mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org>>> wrote: >> >> Hi Michael -- >> >> >> On 5/20/2013 5:56 AM, >> Michael Lawrence wrote: >> >> While it's cool that >> seqlevels<- has become so >> >> flexible, >> >> I still claim >> that >> rename/keep/drop would >> be a lot easier to read, >> >> because >> >> they describe the >> high-level operation, >> and there's no reason to >> >> deprecate >> >> them. They're >> also >> easier to remember. >> For example, for dropping with >> seqlevels<-, the user >> needs >> to remember that >> "force" is necessary to drop. Much >> easier to just say >> "dropSeqlevels(), >> please". And reimplementing >> keepSeqlevels is still >> too >> complicated. Is it >> such a maintenance burden to >> have >> these simple >> wrappers sit >> on top of the >> low-level manipulators? >> >> Another suggestion: >> renameSeqlevels should not >> >> require >> >> a >> >> named vector (in >> cases >> where it is unnamed, >> require that it is parallel to >> >> the >> >> existing >> seqlevels, as >> with seqlevels<-). >> >> >> I didn't really indicate >> what drove my desire to see >> keepSeqlevels >> deprecated. keepSeqlevels, >> seqlevels<-, and isActiveSeq >> >> were >> >> developed more >> or less independently, and >> have different contracts. >> The >> different >> contracts are confusing to >> the user, as is creating a >> >> usable >> >> help system >> when there are multiple >> end points for what is a common >> operation. The help >> pages of each were >> inadequate in different ways. And >> >> because >> >> the code was >> developed independently, >> support for different objects >> >> was >> >> inconsistent. So >> actually, yes, the >> maintenance (and use) burden for the >> previous state of >> affairs was substantial. >> >> On the other hand, I agree >> that keepSeqlevels is >> >> convenient >> >> as a simple >> wrapper around >> seqlevels<-, in the same way that >> setNames >> and names<- are >> both useful. >> >> So we could iterate to >> >> keepSeqlevels <- >> function(x, value, >> ...) >> { >> seqlevels(x, >> force=TRUE) <- value >> x >> } >> >> but I'd be less >> enthusiastic about maintaining the >> >> original >> >> contract of >> keepSeqlevels, where >> keepSeqlevels(gr1, gr2), would >> have >> worked for two >> GRanges objects. >> >> >> Why would this be called >> keepSeqlevels() if, depending on how >> >> it's >> >> used, >> it will either add, drop, rename, >> or permute the seqlevels? >> Couldn't this be called >> setSeqlevels? >> >> >> I thought that new2old was necessary for >> permuting. >> >> >> The seqlevels setter has no 'new2old' arg. >> >> >> You're right that >> >> adding should be disallowed for >> keepSeqlevels(). Adding seqlevels is >> >> not >> >> a common operation. The two common >> operations are: >> * Permuting, usually because the data >> were imported in non-canonical >> order (seqnameStyle was designed to >> address this, no?). >> >> >> Permuting is straightforward with seqlevels<-: >> >> > seqlevels(gr) >> [1] "chr1" "chr2" "chr3" >> >> > seqlevels(gr) <- rev(seqlevels(gr)) >> >> > seqlevels(gr) >> [1] "chr3" "chr2" "chr1" >> >> >> * Subsetting, either by keeping or >> dropping. >> >> >> >> Also straightforward: >> >> > seqlevels(gr, force=TRUE) <- "chr2" >> >> > seqlevels(gr) >> [1] "chr2" >> >> >> Two main reasons: taking a >> >> small slice of the data for prototyping, >> or removing problematic >> chromosomes (sex, circular). This goes >> back to bsapply and the >> >> exclude >> >> argument. >> >> >> There are other important use cases: >> >> - Dropping seqlevels that are *not* in >> use. This happens for >> example >> when subsetting a BAM file with >> Rsamtools::filterBam to keep >> only >> the alignments located on a given >> chromosome. The entire >> sequence >> dictionary (in the header) is >> propagated to the resulting BAM >> file >> so when you read the file with >> readGAlignments() you end up >> with a >> bunch of useless seqlevels that you >> generally start by dropping. >> You don't want to drop any alignment >> so force=FALSE is your >> >> friend. >> >> >> >> This is sort of tangential to the discussion, >> but do you really >> want to >> >> do >> >> this? I would preserve the universe as given by >> the BAM. >> >> >> - An even more common operation is >> renaming: 90% of the times >> that's >> what I use seqlevels<- for, and that's >> what I tell people to use >> when they need to rename their >> chromosomes. Also >> straightforward: >> >> > seqlevels(gr) >> [1] "chr1" "chr2" "chr3" "chrM" >> >> > seqlevels(gr) <- sub("^chr", "", >> seqlevels(gr)) >> > seqlevels(gr) >> [1] "1" "2" "3" "M" >> >> > seqlevels(gr)[seqlevels(gr) == >> "M"] <- "MT" >> > seqlevels(gr) >> [1] "1" "2" "3" "MT" >> >> Note that this is simpler to use than >> renameSeqlevels() which >> always required you to build the >> entire right value as a named >> vector. >> >> >> Sure, renaming is a common use case. Not sure >> how I forgot that. >> >> As I wrote earlier in the thread, >> renameSeqlevels should be changed so >> >> as >> >> not to require naming of the vector. >> >> >> So the added-value of keepSeqlevels() seems >> to boil down to: >> >> (a) it always uses force=TRUE >> >> (b) it preserves the original object and >> returns the modified >> object >> >> If you want to restrict the use of >> keepSeqlevels() to permuting and >> dropping, you'll also have to disallow >> renaming (in addition to >> >> adding). >> >> Then its name will more or less reflect what >> it does (if "keep" means >> "permute" or "drop"). The final result will >> be something that does >> >> less >> >> than setSeqlevels() and that is not >> necessarily easier to read: both >> will set on the object whatever vector of >> seqlevels they are >> supplied, >> except that one will fail when doing so >> would actually result in >> >> adding >> >> or renaming seqlevels. >> >> >> So it looks like seqlevels<- is pretty powerful. >> Now I see that if I >> scroll >> down to the examples in the man page, I find the >> actual documentation. >> It's >> cool to learn that we can replace seqlevels on >> TxDb objects. My >> argument >> has always been that since these low-level >> operations are so powerful, >> it's >> nice to have high-level operations to clarify >> the code. We have to >> think >> hard to know what the RHS will do in that >> replacement. It's >> probably one >> of >> the most complex replacement operations; far >> more complex than the >> >> typical >> >> one, including levels<- on factor. There's >> nothing wrong with that; >> it's >> just complexity that we should be able to hide. >> >> Michael >> >> H. >> >> >> >> Michael >> >> >> H. >> >> >> >> Well, I wasn't even aware of >> that feature, so you've made >> >> your >> >> point about >> the documentation ;) Sounds >> like a good compromise to me. >> >> Thanks for understanding, >> Michael >> >> >> >> Martin >> >> >> Michael >> >> >> >> >> >> >> On Sat, May 18, 2013 >> at 6:00 PM, Martin Morgan >> <mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org> >> <mailto:mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org>> >> >> <mailto:mtmor...@fhcrc.org >> <mailto:mtmor...@fhcrc.org> <mailto: >> >> mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> >> >> >> >> wrote: >> >> On 05/18/2013 >> 05:42 PM, Martin Morgan wrote: >> >> Some of the >> most common operations are >> straight-forward, e.g., >> >> > gr = >> GRanges(paste0("chr", >> c(1:22, >> "X", "Y")), IRanges(1, >> 100)) >> > >> seqlevels(gr) = sub("chr", "ch", >> seqlevels(gr)) >> >> >> To get a more >> comprehensive example I should >> >> have >> >> followed my own >> advice and >> grabbed from the >> help page! >> >> ## Rename: >> >> seqlevels(txdb) <- sub("chr", "", >> seqlevels(txdb)) >> >> seqlevels(txdb) >> >> >> seqlevels(txdb) <- paste0("CH", >> seqlevels(txdb)) >> >> seqlevels(txdb) >> >> >> seqlevels(txdb)[seqlevels(__**** >> __**__txdb) >> >> >> == >> >> >> >> "CHM"] <- "M" >> >> >> >> seqlevels(txdb) >> >> >> -- >> Computational >> Biology / Fred Hutchinson >> Cancer >> Research Center >> 1100 Fairview >> Ave. N. >> PO Box 19024 >> Seattle, WA 98109 >> >> Location: Arnold >> Building M1 B861 >> Phone: (206) >> 667-2793 <tel:%28206%29%20667-2793> >> >> <tel:%28206%29%20667-2793> >> >> >> <tel:%28206%29%20667-2793> >> >> >> >> >> -- >> Dr. Martin Morgan, PhD >> >> Fred Hutchinson Cancer >> Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA >> 98109 >> >> >> [[alternative HTML >> version deleted]] >> >> >> ______________________________** >> __**___________________ >> Bioc-devel@r-project.org >> >> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org> >> > >> <mailto:Bioc-devel@r-project. >> <mailto:Bioc-devel@r-project.>***__*org< >> >> Bioc-devel@r-project.org >> >> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org> >> >> >> >> >> mailing list >> https://stat.ethz.ch/mailman/_** >> __**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/___**_listinfo/bioc-devel> >> <https://stat.ethz.ch/mailman/** >> _**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel> >> >< >> >> https://stat.ethz.ch/mailman/_** >> ___listinfo/bioc-devel<https://stat.ethz.ch/mailman/____listinfo/bioc-devel> >> <https://stat.ethz.ch/mailman/** >> __listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> >> >> >> >> >> <https://stat.ethz.ch/mailman/** >> __**listinfo/bioc-devel<https://stat.ethz.ch/mailman/__**listinfo/bioc-devel>< >> https://stat.ethz.ch/mailman/****listinfo/bioc-devel<https://stat.ethz.ch/mailman/**listinfo/bioc-devel> >> >< >> >> https://stat.ethz.ch/mailman/_** >> _listinfo/bioc-devel <https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> <https://stat.ethz.ch/mailman/** >> listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>> >> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> >> Fred Hutchinson Cancer Research >> Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org> >> <mailto:hpa...@fhcrc.org >> >> <mailto:hpa...@fhcrc.org>> >> Phone: (206) 667-5791 >> <tel:%28206%29%20667-5791> >> <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 >> <tel:%28206%29%20667-1319> >> <tel:%28206%29%20667-1319> >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org >> <mailto:hpa...@fhcrc.org> >> Phone: (206) 667-5791<tel:%28206%29%20667-5791> >> Fax: (206) 667-1319<tel:%28206%29%20667-1319> >> >> >> [[alternative HTML version deleted]] >> >> >> ______________________________** >> ___________________ >> Bioc-devel@r-project.org >> >> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org>> >> mailing list >> https://stat.ethz.ch/mailman/_** >> _listinfo/bioc-devel <https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> <https://stat.ethz.ch/mailman/** >> listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel>> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> ______________________________**___________________ >> Bioc-devel@r-project.org >> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org>> >> mailing list >> >> https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> >> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> > >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> ______________________________**___________________ >> Bioc-devel@r-project.org >> <mailto:Bioc-devel@r-project.**org<Bioc-devel@r-project.org> >> > >> mailing list >> >> https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel<https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> >> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel<https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> > >> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel