Oh :-) - Thankyou for explaining!
________________________________________
From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:40 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

Having a "." in the function name does not make something "S3".
There's no dispatch from import() to import.bed(). Had I not been a
total newb when I created rtracklayer, I would have called the
function importBed() or something like that. Sorry for the confusion.

On Tue, Sep 17, 2019 at 5:34 AM Bhagwat, Aditya
<aditya.bhag...@mpi-bn.mpg.de> wrote:
>
> Oh, superb, thx!
>
> Interesting ... here you use S3 rather than S4 - I wonder the design 
> intention underlying these choices (I'm asking because I am trying to figure 
> out myself when to use S3 and when to use S4 and whether to mix the two).
>
> Aditya
>
> ________________________________________
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 2:23 PM
> To: Bhagwat, Aditya
> Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> The generic documentation does not mention it, but see ?import.bed.
> It's similar to colClasses on read.table().
>
> On Tue, Sep 17, 2019 at 5:15 AM Bhagwat, Aditya
> <aditya.bhag...@mpi-bn.mpg.de> wrote:
> >
> > Thankyou Michael,
> >
> > How do I use the extraCols argument? The documentation does not mention an 
> > `extraCols` argument explicitly, so it must be one of the ellipsis 
> > arguments, but `?rtracklayer::import` does not mention it. Should I say 
> > extraCols = 10 (ten extra columns) or so?
> >
> > Aditya
> >
> > ________________________________________
> > From: Michael Lawrence [lawrence.mich...@gene.com]
> > Sent: Tuesday, September 17, 2019 2:05 PM
> > To: Bhagwat, Aditya
> > Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
> > Subject: Re: [Bioc-devel] read_bed()
> >
> > It breaks it because it's not standard BED; however, using the
> > extraCols= argument should work in this case. Requiring an explicit
> > format specification is intentional, because it provides validation
> > and type safety, and it communicates the format to a future reader.
> > This also looks a bit like a bedPE file, so you might consider using
> > the Pairs data structure.
> >
> > Michael
> >
> > On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
> > <aditya.bhag...@mpi-bn.mpg.de> wrote:
> > >
> > > Hi Michael,
> > >
> > > Yeah, I also noticed that the attachment was eaten when it entered the 
> > > bio-devel list.
> > >
> > > The file is also accessible in the extdata of the multicrispr:
> > > https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
> > >
> > > A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 
> > > 3 (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
> > >
> > > I am curious as to why you feel that having additional columns in a 
> > > bedfile would break it?
> > >
> > > Cheers,
> > >
> > > Aditya
> > >
> > > ________________________________________
> > > From: Michael Lawrence [lawrence.mich...@gene.com]
> > > Sent: Tuesday, September 17, 2019 1:41 PM
> > > To: Bhagwat, Aditya
> > > Cc: Shepherd, Lori; bioc-devel@r-project.org
> > > Subject: Re: [Bioc-devel] read_bed()
> > >
> > > I don't see an attachment, nor can I find the multicrispr package
> > > anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> > > which was addressed many years ago, so I've updated the documentation.
> > > Broken BED files will never be supported.
> > >
> > > Michael
> > >
> > > On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> > > <aditya.bhag...@mpi-bn.mpg.de> wrote:
> > > >
> > > > Hi Lori,
> > > >
> > > > I remember now - I tried this function earlier, but it does not work 
> > > > for my bedfiles, like the one in attach.
> > > >
> > > > > bedfile      <- system.file('extdata/SRF.bed', package = 
> > > > > 'multicrispr')
> > > > >
> > > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > > > 'mm10')
> > > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > > > dec,  : scan() expected 'an integer', got 'chr2'
> > > > >
> > > >
> > > > Perhaps this sentence in `?rtracklayer::import` points to the source of 
> > > > the error?
> > > >
> > > > many tools and organizations have extended BED with additional columns. 
> > > > These are not officially valid BED files, and as such rtracklayer does 
> > > > not yet support them (this will be addressed soon).
> > > >
> > > > Which brings the question: how soon is soon :-D ?
> > > >
> > > > Aditya
> > > >
> > > >
> > > > ________________________________
> > > > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > > > Sent: Tuesday, September 17, 2019 1:02 PM
> > > > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > > > Subject: Re: read_bed()
> > > >
> > > > Please look at rtracklayer::import()  function that we recommend for 
> > > > reading of BAM files along with other common formats.
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > Lori Shepherd
> > > >
> > > > Bioconductor Core Team
> > > >
> > > > Roswell Park Cancer Institute
> > > >
> > > > Department of Biostatistics & Bioinformatics
> > > >
> > > > Elm & Carlton Streets
> > > >
> > > > Buffalo, New York 14263
> > > >
> > > > ________________________________
> > > > From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of 
> > > > Bhagwat, Aditya <aditya.bhag...@mpi-bn.mpg.de>
> > > > Sent: Tuesday, September 17, 2019 6:58 AM
> > > > To: bioc-devel@r-project.org <bioc-devel@r-project.org>
> > > > Subject: [Bioc-devel] read_bed()
> > > >
> > > > Dear bioc-devel,
> > > >
> > > > I had two feedback requests regarding the function read_bed().
> > > >
> > > > 1) Did I overlook, and therefore, re-invent existing functionality?
> > > > 2) If not, would `read_bed` be suited for existence in a more 
> > > > foundational package, e.g. `GenomicRanges`, given the rather basal 
> > > > nature of this functionality?
> > > >
> > > > It reads a bedfile into a GRanges, converts the coordinates from 
> > > > 0-based (bedfile) to 1-based 
> > > > (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome info (to 
> > > > allow for implicit range validity 
> > > > checking<https://support.bioconductor.org/p/124250>) and plots the 
> > > > karyogram<https://support.bioconductor.org/p/124328>.
> > > >
> > > > Thank you for your feedback.
> > > >
> > > > Cheers,
> > > >
> > > > Aditya
> > > >
> > > >
> > > > #' Read bedfile into GRanges
> > > > #'
> > > > #' @param bedfile        file path
> > > > #' @param bsgenome       BSgenome, e.g. 
> > > > BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> > > > #' @param zero_based     logical(1): whether bedfile GRanges are 0-based
> > > > #' @param rm_duplicates  logical(1)
> > > > #' @param plot           logical(1)
> > > > #' @param verbose        logical(1)
> > > > #' @return \code{\link[GenomicRanges]{GRanges-class}}
> > > > #' @note By convention BED files are 0-based. GRanges are always 
> > > > 1-based.
> > > > #'       A good discussion on these two alternative codings is given
> > > > #'       by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
> > > > #' @examples
> > > > #' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > > > #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> > > > #' (gr <- read_bed(bedfile, bsgenome))
> > > > #' @importFrom  data.table  :=
> > > > #' @export
> > > > read_bed <- function(
> > > >     bedfile,
> > > >     bsgenome,
> > > >     zero_based    = TRUE,
> > > >     rm_duplicates = TRUE,
> > > >     plot          = TRUE,
> > > >     verbose       = TRUE
> > > > ){
> > > >     # Assert
> > > >     assert_all_are_existing_files(bedfile)
> > > >     assert_is_a_bool(verbose)
> > > >     assert_is_a_bool(rm_duplicates)
> > > >     assert_is_a_bool(zero_based)
> > > >
> > > >     # Comply
> > > >     seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL
> > > >
> > > >     # Read
> > > >     if (verbose) cmessage('\tRead %s', bedfile)
> > > >     dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
> > > >             col.names = c('seqnames', 'start', 'end', 'strand'))
> > > >     data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))
> > > >
> > > >     # Transform coordinates: 0-based -> 1-based
> > > >     if (zero_based){
> > > >         if (verbose)    cmessage('\t\tConvert 0 -> 1-based')
> > > >         dt[, start := start + 1]
> > > >     }
> > > >
> > > >     if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
> > > >                             nrow(dt), length(unique(dt$seqnames)))
> > > >
> > > >     # Drop duplicates
> > > >     if (rm_duplicates){
> > > >         is_duplicated <- cduplicated(dt)
> > > >         if (any(is_duplicated)){
> > > >             if (verbose) cmessage('\t\t        %d after removing 
> > > > duplicates')
> > > >             dt %<>% extract(!duplicated)
> > > >         }
> > > >     }
> > > >
> > > >     # Turn into GRanges
> > > >     gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)
> > > >
> > > >     # Plot and return
> > > >     title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
> > > >     if (plot) plot_karyogram(gr, title)
> > > >     gr
> > > > }
> > > >
> > > >
> > > >         [[alternative HTML version deleted]]
> > > >
> > > > _______________________________________________
> > > > Bioc-devel@r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > > >
> > > > This email message may contain legally privileged and/or confidential 
> > > > information. If you are not the intended recipient(s), or the employee 
> > > > or agent responsible for the delivery of this message to the intended 
> > > > recipient(s), you are hereby notified that any disclosure, copying, 
> > > > distribution, or use of this email message is prohibited. If you have 
> > > > received this message in error, please notify the sender immediately by 
> > > > e-mail and delete this email message from your computer. Thank you.
> > > > _______________________________________________
> > > > Bioc-devel@r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > >
> > >
> > > --
> > > Michael Lawrence
> > > Scientist, Bioinformatics and Computational Biology
> > > Genentech, A Member of the Roche Group
> > > Office +1 (650) 225-7760
> > > micha...@gene.com
> > >
> > > Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
> >
> >
> >
> > --
> > Michael Lawrence
> > Scientist, Bioinformatics and Computational Biology
> > Genentech, A Member of the Roche Group
> > Office +1 (650) 225-7760
> > micha...@gene.com
> >
> > Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>
>
>
> --
> Michael Lawrence
> Scientist, Bioinformatics and Computational Biology
> Genentech, A Member of the Roche Group
> Office +1 (650) 225-7760
> micha...@gene.com
>
> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube



--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to