Currently, the way grouping indices are generated is pretty slow if you’re doing stuff rowwise. Michael’s suggestion for using selfmatch should speed things up a bit. What are you planning to do after grouping? I’ve found there’s usually to do stuff without rowwise grouping but really depends on what you’re after. Re your other issue would you mind putting it on as a GitHub issue. — Stuart Lee Visiting PhD Student - Ritchie Lab
On 16 Oct 2019, at 22:54, Michael Lawrence <lawrence.mich...@gene.com<mailto:lawrence.mich...@gene.com>> wrote: Just a note that in this particular case, selfmatch(annotatedsrf) would be a fast way to generate a grouping vector, like plyranges::group_by(annotatedsrf, selfmatch(annotatedsrf)). Michael On Wed, Oct 16, 2019 at 2:48 AM Bhagwat, Aditya <aditya.bhag...@mpi-bn.mpg.de<mailto:aditya.bhag...@mpi-bn.mpg.de>> wrote: Hi Stuart, Michael, Your plyranges package is really cool - now I am using it for left joining GRanges (I am facing a minor issue there<https://support.bioconductor.org/p/125623/>, but that is not the topic of this email - I have been asked by Lori not to double-post :-)). This email is about the plyranges functionality for grouping GRanges. That is cool, but I found it to be not so performant for large numbers of ranges. My R session hangs when I do: bedfile <- paste0('https://gitlab.gwdg.de/loosolab/software/multicrispr/wikis', '/uploads/a51e98516c1e6b71441f5b5a5f741fa1/SRF.bed') srfranges <- rtracklayer::import.bed(bedfile, genome = 'mm10') txdb <- TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene generanges <- GenomicFeatures::genes(txdb) annotatedsrf <- plyranges::join_overlap_left(srfranges, generanges) plyranges::group_by(annotatedsrf, seqnames, start, end, strand) For my purposes, I worked around it by performing a groupby in data.table: data.table::as.data.table(annotatedsrf)[ !is.na<http://is.na/>(gene_id), gene_id := paste0(gene_id, collapse = ';'), by = c('seqnames', 'start', 'end', 'strand')) And was wondering, in general, whether it would be useful to have a data.table-based backend for plyranges::groupby() And, whether all of this is actually a on-issue due to my improper use of plyranges::group_by properly. Thank you for feebdack :-) Aditya -- Michael Lawrence Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 micha...@gene.com<mailto:micha...@gene.com> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube _______________________________________________ The information in this email is confidential and intended solely for the addressee. You must not disclose, forward, print or use it without the permission of the sender. The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of the Kulin Nation as the traditional owners of the land where our campuses are located and the continuing connection to country and community. _______________________________________________ [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel