Thank you Michael,

In attach the example file, since I noticed you were unable to download it from 
gitlab.
Will continue the discussion there, then :-)

Aditya

________________________________
From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Thursday, October 17, 2019 11:45 AM
To: Bhagwat, Aditya
Cc: Stuart Lee; Michael Lawrence; bioc-devel@r-project.org
Subject: Re: plyranges group_by

I replied on the support site. Let's move the discussion there.

On Thu, Oct 17, 2019 at 1:24 AM Bhagwat, Aditya 
<aditya.bhag...@mpi-bn.mpg.de<mailto:aditya.bhag...@mpi-bn.mpg.de>> wrote:
Thank you Stuart and Michael for your feedback.

Stuart, in response to your request for more context regarding my use case, I 
have updated my recent BioC support 
post<https://support.bioconductor.org/p/125623/>, now providing all use-case 
details.

Michael, I didn't selfmatch yet, but Stuart's reply seems to suggest that it 
would not get the data.table performance (which is literally instantaneous).

As a general question, do you think it would be useful to add a 
data.table-based split-apply-combine functionality to plyranges (such that end 
user operations remain on GRanges-only)? I wouldn't mind writing a function to 
do that (in github), but first need your feedback as to whether you think that 
would be useful :-)

Aditya


________________________________
From: Stuart Lee [le...@wehi.edu.au<mailto:le...@wehi.edu.au>]
Sent: Thursday, October 17, 2019 3:01 AM
To: Michael Lawrence
Cc: Bhagwat, Aditya; bioc-devel@r-project.org<mailto:bioc-devel@r-project.org>
Subject: Re: plyranges group_by

Currently, the way grouping indices are generated is pretty slow if you’re 
doing stuff rowwise. Michael’s suggestion for using selfmatch should speed 
things up a bit. What are you planning to do after grouping? I’ve found there’s 
usually to do stuff without rowwise grouping but really depends on what you’re 
after. Re your other issue would you mind putting it on as a GitHub issue.
—
Stuart Lee
Visiting PhD Student - Ritchie Lab



On 16 Oct 2019, at 22:54, Michael Lawrence 
<lawrence.mich...@gene.com<mailto:lawrence.mich...@gene.com>> wrote:

Just a note that in this particular case, selfmatch(annotatedsrf) would be a 
fast way to generate a grouping vector, like plyranges::group_by(annotatedsrf, 
selfmatch(annotatedsrf)).

Michael

On Wed, Oct 16, 2019 at 2:48 AM Bhagwat, Aditya 
<aditya.bhag...@mpi-bn.mpg.de<mailto:aditya.bhag...@mpi-bn.mpg.de>> wrote:
Hi Stuart, Michael,

Your plyranges package is really cool - now I am using it for left joining 
GRanges (I am facing a minor issue 
there<https://support.bioconductor.org/p/125623/>, but that is not the topic of 
this email - I have been asked by Lori not to double-post :-)).

This email is about the plyranges functionality for grouping GRanges.
That is cool, but I found it to be not so performant for large numbers of 
ranges.
My R session hangs when I do:

bedfile <- paste0('https://gitlab.gwdg.de/loosolab/software/multicrispr/wikis',
                      '/uploads/a51e98516c1e6b71441f5b5a5f741fa1/SRF.bed')
srfranges <- rtracklayer::import.bed(bedfile, genome = 'mm10')
txdb <- TxDb.Mmusculus.UCSC.mm10.ensGene::TxDb.Mmusculus.UCSC.mm10.ensGene
    generanges <- GenomicFeatures::genes(txdb)
annotatedsrf <- plyranges::join_overlap_left(srfranges, generanges)
plyranges::group_by(annotatedsrf, seqnames, start, end, strand)

For my purposes, I worked around it by performing a groupby in data.table:

data.table::as.data.table(annotatedsrf)[
    !is.na<http://is.na/>(gene_id),
    gene_id := paste0(gene_id, collapse = ';'),
    by = c('seqnames', 'start', 'end', 'strand'))

And was wondering, in general, whether it would be useful to have a 
data.table-based backend for plyranges::groupby()
And, whether all of this is actually a on-issue due to my improper use of 
plyranges::group_by properly.

Thank you for feebdack :-)

Aditya




--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com<mailto:micha...@gene.com>

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube


_______________________________________________

The information in this email is confidential and intended solely for the 
addressee.
You must not disclose, forward, print or use it without the permission of the 
sender.

The Walter and Eliza Hall Institute acknowledges the Wurundjeri people of the 
Kulin
Nation as the traditional owners of the land where our campuses are located and
the continuing connection to country and community.
_______________________________________________


--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com<mailto:micha...@gene.com>

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to