I'd suggest separating the operations on the data from the interface. You can have both, one layer for programming and another for interactive analysis.
On Wed, Sep 18, 2019 at 5:05 AM Bhagwat, Aditya <aditya.bhag...@mpi-bn.mpg.de> wrote: > > In the end I endeavour to end up with a handful of verbs, with which I can do > all tasks in a project. > > Regarding the BED files: they're basic bed files, with additional metadata > columns to allow traceback. But for the purpose of multicrispr, non need to > restrict to those files only. You extraCols works great for me. And for > multicrispr examples, I have removed the metadata cols to keep things simple. > You were right btw that things went wrong earlier in the column stripping > process. > > Aditya > > > ________________________________________ > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, > Aditya [aditya.bhag...@mpi-bn.mpg.de] > Sent: Wednesday, September 18, 2019 1:57 PM > To: Michael Lawrence > Cc: bioc-devel@r-project.org > Subject: Re: [Bioc-devel] read_bed() > > Hi Michael, > > That's a software design dilemma I've encountered a few times. > > One approach is to keep the "verb" functions bare. E.g. read_bed would only > read a bedfile, and plot_bed would somehow plot it. Advantage: if read_bed > doesn't depend on anything else, other functions can depend on it, which > makes dependency handling easier. > > Another intention is to make verb functions "intuitive". In that scenario, I > try for each operation to also output a visual image of the operation, to > make it easier to see at a glance what is happening. E.g. for the range > operations in multicrispr, the function plot_intervals visually shows what > operation is being performed, making it easier to both spot errors as well as > maintain focus. > > In the case of read_bed, I thought of wrapping around your excellent > core-level rtracklayer::import(), additionally providing the textual and > visual feedback which I intent to give. > > Interesting to hear your suggestions on this topic, though. > > Aditya > > > ________________________________________ > From: Michael Lawrence [lawrence.mich...@gene.com] > Sent: Wednesday, September 18, 2019 1:33 PM > To: Bhagwat, Aditya > Cc: Michael Lawrence; bioc-devel@r-project.org > Subject: Re: [Bioc-devel] read_bed() > > I'm not sure if a function called read_bed() should be plotting or > printing. Is your BED file a known BED variant, i.e., maybe there is a > better name for the file type than "bed"? > > > On Wed, Sep 18, 2019 at 3:17 AM Bhagwat, Aditya > <aditya.bhag...@mpi-bn.mpg.de> wrote: > > > > Actually, > > > > I will keep multicrispr::read_bed(), but wrap it around > > rtracklayer::import.bed, and additionally plot and print range summaries. > > > > Aditya > > > > ________________________________________ > > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, > > Aditya [aditya.bhag...@mpi-bn.mpg.de] > > Sent: Wednesday, September 18, 2019 11:31 AM > > To: Michael Lawrence > > Cc: bioc-devel@r-project.org > > Subject: Re: [Bioc-devel] read_bed() > > > > (Typo corrected to avoid confusion) > > > > Michael, > > > > rtracklayer::import.bed() indeed works perfectly for me, so I am dropping > > multicrispr::read_bed(). > > > > In order to avoid the overkill of `require(tracklayer)` for multicrispr > > <https://gitlab.gwdg.de/loosolab/software/multicrispr> users, does it make > > sense to import/re-export import.bed() in multicrispr? What is BioC > > convention/best practice in such cases? > > > > Aditya > > > > > > > > ________________________________________ > > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, > > Aditya [aditya.bhag...@mpi-bn.mpg.de] > > Sent: Wednesday, September 18, 2019 8:35 AM > > To: Michael Lawrence > > Cc: bioc-devel@r-project.org > > Subject: Re: [Bioc-devel] read_bed() > > > > Thank you Michael :-) > > > > Aditya > > ________________________________________ > > From: Michael Lawrence [lawrence.mich...@gene.com] > > Sent: Tuesday, September 17, 2019 8:49 PM > > To: Bhagwat, Aditya > > Cc: Michael Lawrence; bioc-devel@r-project.org > > Subject: Re: [Bioc-devel] read_bed() > > > > I think you probably made a mistake when dropping the columns. When I > > provide the extraCols= argument (inventing my own names for things), > > it almost works, but it breaks due to NAs in the extra columns. The > > "." character is the standard way to express NA in BED files. I've > > added support for extra na.strings to version 1.45.6. > > > > For reference, the call is like: > > > > import("SRF.bed", extraCols=c(chr2="character", start2="integer", > > end2="integer", mDux="factor", type="factor", pos1="integer", > > pos2="integer", strand2="factor", from="factor", n="integer", > > code="character", anno="factor", id="character", biotype="character", > > score2="numeric" ), na.strings="NA") > > > > > > On Tue, Sep 17, 2019 at 7:23 AM Bhagwat, Aditya > > <aditya.bhag...@mpi-bn.mpg.de> wrote: > > > > > > Hi Michael, > > > > > > I removed the additional metadata columns in SRF.bed > > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed > > > > > > But still can't get rtracklayer::import.bed working: > > > > > > > rtracklayer::import.bed(bedfile) > > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = > > > dec, : scan() expected 'a real', got '1.168.595' > > > > bedfile > > > [1] > > > "C:/Users/abhagwa/Documents/R/R-3.6.1/library/multicrispr/extdata/SRF.bed" > > > > > > Never mind, multicrispr function read_bed, based on data.table::fread is > > > doing the job, so I will stick to that . > > > > > > Thank you for all feedback, > > > > > > Cheers, > > > > > > Aditya > > > > > > > > > ________________________________________ > > > From: Bioc-devel [bioc-devel-boun...@r-project.org] on behalf of Bhagwat, > > > Aditya [aditya.bhag...@mpi-bn.mpg.de] > > > Sent: Tuesday, September 17, 2019 2:48 PM > > > To: Michael Lawrence > > > Cc: bioc-devel@r-project.org > > > Subject: Re: [Bioc-devel] read_bed() > > > > > > Oh :-) - Thankyou for explaining! > > > ________________________________________ > > > From: Michael Lawrence [lawrence.mich...@gene.com] > > > Sent: Tuesday, September 17, 2019 2:40 PM > > > To: Bhagwat, Aditya > > > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org > > > Subject: Re: [Bioc-devel] read_bed() > > > > > > Having a "." in the function name does not make something "S3". > > > There's no dispatch from import() to import.bed(). Had I not been a > > > total newb when I created rtracklayer, I would have called the > > > function importBed() or something like that. Sorry for the confusion. > > > > > > On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya > > > <aditya.bhag...@mpi-bn.mpg.de> wrote: > > > > > > > > Oh, superb, thx! > > > > > > > > Interesting ... here you use S3 rather than S4 - I wonder the design > > > > intention underlying these choices (I'm asking because I am trying to > > > > figure out myself when to use S3 and when to use S4 and whether to mix > > > > the two). > > > > > > > > Aditya > > > > > > > > ________________________________________ > > > > From: Michael Lawrence [lawrence.mich...@gene.com] > > > > Sent: Tuesday, September 17, 2019 2:23 PM > > > > To: Bhagwat, Aditya > > > > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org > > > > Subject: Re: [Bioc-devel] read_bed() > > > > > > > > The generic documentation does not mention it, but see ?import.bed. > > > > It's similar to colClasses on read.table(). > > > > > > > > On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya > > > > <aditya.bhag...@mpi-bn.mpg.de> wrote: > > > > > > > > > > Thankyou Michael, > > > > > > > > > > How do I use the extraCols argument? The documentation does not > > > > > mention an `extraCols` argument explicitly, so it must be one of the > > > > > ellipsis arguments, but `?rtracklayer::import` does not mention it. > > > > > Should I say extraCols = 10 (ten extra columns) or so? > > > > > > > > > > Aditya > > > > > > > > > > ________________________________________ > > > > > From: Michael Lawrence [lawrence.mich...@gene.com] > > > > > Sent: Tuesday, September 17, 2019 2:05 PM > > > > > To: Bhagwat, Aditya > > > > > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org > > > > > Subject: Re: [Bioc-devel] read_bed() > > > > > > > > > > It breaks it because it's not standard BED; however, using the > > > > > extraCols= argument should work in this case. Requiring an explicit > > > > > format specification is intentional, because it provides validation > > > > > and type safety, and it communicates the format to a future reader. > > > > > This also looks a bit like a bedPE file, so you might consider using > > > > > the Pairs data structure. > > > > > > > > > > Michael > > > > > > > > > > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya > > > > > <aditya.bhag...@mpi-bn.mpg.de> wrote: > > > > > > > > > > > > Hi Michael, > > > > > > > > > > > > Yeah, I also noticed that the attachment was eaten when it entered > > > > > > the bio-devel list. > > > > > > > > > > > > The file is also accessible in the extdata of the multicrispr: > > > > > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed > > > > > > > > > > > > A bedfile to GRanges importer requires columns 1 (chrom), 2 > > > > > > (chromStart), 3 (chromEnd), and column 6 (strand). All of these are > > > > > > present in SRF.bed. > > > > > > > > > > > > I am curious as to why you feel that having additional columns in a > > > > > > bedfile would break it? > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Aditya > > > > > > > > > > > > ________________________________________ > > > > > > From: Michael Lawrence [lawrence.mich...@gene.com] > > > > > > Sent: Tuesday, September 17, 2019 1:41 PM > > > > > > To: Bhagwat, Aditya > > > > > > Cc: Shepherd, Lori; bioc-devel@r-project.org > > > > > > Subject: Re: [Bioc-devel] read_bed() > > > > > > > > > > > > I don't see an attachment, nor can I find the multicrispr package > > > > > > anywhere. The "addressed soon" was referring to the BEDX+Y formats, > > > > > > which was addressed many years ago, so I've updated the > > > > > > documentation. > > > > > > Broken BED files will never be supported. > > > > > > > > > > > > Michael > > > > > > > > > > > > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya > > > > > > <aditya.bhag...@mpi-bn.mpg.de> wrote: > > > > > > > > > > > > > > Hi Lori, > > > > > > > > > > > > > > I remember now - I tried this function earlier, but it does not > > > > > > > work for my bedfiles, like the one in attach. > > > > > > > > > > > > > > > bedfile <- system.file('extdata/SRF.bed', package = > > > > > > > > 'multicrispr') > > > > > > > > > > > > > > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', > > > > > > > > genome = 'mm10') > > > > > > > Error in scan(file = file, what = what, sep = sep, quote = quote, > > > > > > > dec = dec, : scan() expected 'an integer', got 'chr2' > > > > > > > > > > > > > > > > > > > > > > Perhaps this sentence in `?rtracklayer::import` points to the > > > > > > > source of the error? > > > > > > > > > > > > > > many tools and organizations have extended BED with additional > > > > > > > columns. These are not officially valid BED files, and as such > > > > > > > rtracklayer does not yet support them (this will be addressed > > > > > > > soon). > > > > > > > > > > > > > > Which brings the question: how soon is soon :-D ? > > > > > > > > > > > > > > Aditya > > > > > > > > > > > > > > > > > > > > > ________________________________ > > > > > > > From: Shepherd, Lori [lori.sheph...@roswellpark.org] > > > > > > > Sent: Tuesday, September 17, 2019 1:02 PM > > > > > > > To: Bhagwat, Aditya; bioc-devel@r-project.org > > > > > > > Subject: Re: read_bed() > > > > > > > > > > > > > > Please look at rtracklayer::import() function that we recommend > > > > > > > for reading of BAM files along with other common formats. > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > > > > > > Lori Shepherd > > > > > > > > > > > > > > Bioconductor Core Team > > > > > > > > > > > > > > Roswell Park Cancer Institute > > > > > > > > > > > > > > Department of Biostatistics & Bioinformatics > > > > > > > > > > > > > > Elm & Carlton Streets > > > > > > > > > > > > > > Buffalo, New York 14263 > > > > > > > > > > > > > > ________________________________ > > > > > > > From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of > > > > > > > Bhagwat, Aditya <aditya.bhag...@mpi-bn.mpg.de> > > > > > > > Sent: Tuesday, September 17, 2019 6:58 AM > > > > > > > To: bioc-devel@r-project.org <bioc-devel@r-project.org> > > > > > > > Subject: [Bioc-devel] read_bed() > > > > > > > > > > > > > > Dear bioc-devel, > > > > > > > > > > > > > > I had two feedback requests regarding the function read_bed(). > > > > > > > > > > > > > > 1) Did I overlook, and therefore, re-invent existing > > > > > > > functionality? > > > > > > > 2) If not, would `read_bed` be suited for existence in a more > > > > > > > foundational package, e.g. `GenomicRanges`, given the rather > > > > > > > basal nature of this functionality? > > > > > > > > > > > > > > It reads a bedfile into a GRanges, converts the coordinates from > > > > > > > 0-based (bedfile) to 1-based > > > > > > > (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome info > > > > > > > (to allow for implicit range validity > > > > > > > checking<https://support.bioconductor.org/p/124250>) and plots > > > > > > > the karyogram<https://support.bioconductor.org/p/124328>. > > > > > > > > > > > > > > Thank you for your feedback. > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > Aditya > > > > > > > > > > > > > > > > > > > > > #' Read bedfile into GRanges > > > > > > > #' > > > > > > > #' @param bedfile file path > > > > > > > #' @param bsgenome BSgenome, e.g. > > > > > > > BSgenome.Mmusculus.UCSC.mm10::Mmusculus > > > > > > > #' @param zero_based logical(1): whether bedfile GRanges are > > > > > > > 0-based > > > > > > > #' @param rm_duplicates logical(1) > > > > > > > #' @param plot logical(1) > > > > > > > #' @param verbose logical(1) > > > > > > > #' @return \code{\link[GenomicRanges]{GRanges-class}} > > > > > > > #' @note By convention BED files are 0-based. GRanges are always > > > > > > > 1-based. > > > > > > > #' A good discussion on these two alternative codings is given > > > > > > > #' by Obi Griffith on Biostars: https://www.biostars.org/p/84686/ > > > > > > > #' @examples > > > > > > > #' bedfile <- system.file('extdata/SRF.bed', package = > > > > > > > 'multicrispr') > > > > > > > #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus > > > > > > > #' (gr <- read_bed(bedfile, bsgenome)) > > > > > > > #' @importFrom data.table := > > > > > > > #' @export > > > > > > > read_bed <- function( > > > > > > > bedfile, > > > > > > > bsgenome, > > > > > > > zero_based = TRUE, > > > > > > > rm_duplicates = TRUE, > > > > > > > plot = TRUE, > > > > > > > verbose = TRUE > > > > > > > ){ > > > > > > > # Assert > > > > > > > assert_all_are_existing_files(bedfile) > > > > > > > assert_is_a_bool(verbose) > > > > > > > assert_is_a_bool(rm_duplicates) > > > > > > > assert_is_a_bool(zero_based) > > > > > > > > > > > > > > # Comply > > > > > > > seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL > > > > > > > > > > > > > > # Read > > > > > > > if (verbose) cmessage('\tRead %s', bedfile) > > > > > > > dt <- data.table::fread(bedfile, select = c(seq_len(3), 6), > > > > > > > col.names = c('seqnames', 'start', 'end', 'strand')) > > > > > > > data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand')) > > > > > > > > > > > > > > # Transform coordinates: 0-based -> 1-based > > > > > > > if (zero_based){ > > > > > > > if (verbose) cmessage('\t\tConvert 0 -> 1-based') > > > > > > > dt[, start := start + 1] > > > > > > > } > > > > > > > > > > > > > > if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes', > > > > > > > nrow(dt), length(unique(dt$seqnames))) > > > > > > > > > > > > > > # Drop duplicates > > > > > > > if (rm_duplicates){ > > > > > > > is_duplicated <- cduplicated(dt) > > > > > > > if (any(is_duplicated)){ > > > > > > > if (verbose) cmessage('\t\t %d after removing duplicates') > > > > > > > dt %<>% extract(!duplicated) > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > # Turn into GRanges > > > > > > > gr <- add_seqinfo(as(dt, 'GRanges'), bsgenome) > > > > > > > > > > > > > > # Plot and return > > > > > > > title <- paste0(providerVersion(bsgenome), ': ', > > > > > > > basename(bedfile)) > > > > > > > if (plot) plot_karyogram(gr, title) > > > > > > > gr > > > > > > > } > > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioc-devel@r-project.org mailing list > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > > > > > > > > > This email message may contain legally privileged and/or > > > > > > > confidential information. If you are not the intended > > > > > > > recipient(s), or the employee or agent responsible for the > > > > > > > delivery of this message to the intended recipient(s), you are > > > > > > > hereby notified that any disclosure, copying, distribution, or > > > > > > > use of this email message is prohibited. If you have received > > > > > > > this message in error, please notify the sender immediately by > > > > > > > e-mail and delete this email message from your computer. Thank > > > > > > > you. > > > > > > > _______________________________________________ > > > > > > > Bioc-devel@r-project.org mailing list > > > > > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Michael Lawrence > > > > > > Scientist, Bioinformatics and Computational Biology > > > > > > Genentech, A Member of the Roche Group > > > > > > Office +1 (650) 225-7760 > > > > > > micha...@gene.com > > > > > > > > > > > > Join Genentech on LinkedIn | Twitter | Facebook | Instagram | > > > > > > YouTube > > > > > > > > > > > > > > > > > > > > -- > > > > > Michael Lawrence > > > > > Scientist, Bioinformatics and Computational Biology > > > > > Genentech, A Member of the Roche Group > > > > > Office +1 (650) 225-7760 > > > > > micha...@gene.com > > > > > > > > > > Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube > > > > > > > > > > > > > > > > -- > > > > Michael Lawrence > > > > Scientist, Bioinformatics and Computational Biology > > > > Genentech, A Member of the Roche Group > > > > Office +1 (650) 225-7760 > > > > micha...@gene.com > > > > > > > > Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube > > > > > > > > > > > > -- > > > Michael Lawrence > > > Scientist, Bioinformatics and Computational Biology > > > Genentech, A Member of the Roche Group > > > Office +1 (650) 225-7760 > > > micha...@gene.com > > > > > > Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube > > > > > > _______________________________________________ > > > Bioc-devel@r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > > > > -- > > Michael Lawrence > > Scientist, Bioinformatics and Computational Biology > > Genentech, A Member of the Roche Group > > Office +1 (650) 225-7760 > > micha...@gene.com > > > > Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > -- > Michael Lawrence > Scientist, Bioinformatics and Computational Biology > Genentech, A Member of the Roche Group > Office +1 (650) 225-7760 > micha...@gene.com > > Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Michael Lawrence Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 micha...@gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel