Thankyou Michael, 

How do I use the extraCols argument? The documentation does not mention an 
`extraCols` argument explicitly, so it must be one of the ellipsis arguments, 
but `?rtracklayer::import` does not mention it. Should I say extraCols = 10 
(ten extra columns) or so?

Aditya

________________________________________
From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 2:05 PM
To: Bhagwat, Aditya
Cc: Michael Lawrence; Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

It breaks it because it's not standard BED; however, using the
extraCols= argument should work in this case. Requiring an explicit
format specification is intentional, because it provides validation
and type safety, and it communicates the format to a future reader.
This also looks a bit like a bedPE file, so you might consider using
the Pairs data structure.

Michael

On Tue, Sep 17, 2019 at 4:51 AM Bhagwat, Aditya
<aditya.bhag...@mpi-bn.mpg.de> wrote:
>
> Hi Michael,
>
> Yeah, I also noticed that the attachment was eaten when it entered the 
> bio-devel list.
>
> The file is also accessible in the extdata of the multicrispr:
> https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed
>
> A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
> (chromEnd), and column 6 (strand). All of these are present in SRF.bed.
>
> I am curious as to why you feel that having additional columns in a bedfile 
> would break it?
>
> Cheers,
>
> Aditya
>
> ________________________________________
> From: Michael Lawrence [lawrence.mich...@gene.com]
> Sent: Tuesday, September 17, 2019 1:41 PM
> To: Bhagwat, Aditya
> Cc: Shepherd, Lori; bioc-devel@r-project.org
> Subject: Re: [Bioc-devel] read_bed()
>
> I don't see an attachment, nor can I find the multicrispr package
> anywhere. The "addressed soon" was referring to the BEDX+Y formats,
> which was addressed many years ago, so I've updated the documentation.
> Broken BED files will never be supported.
>
> Michael
>
> On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
> <aditya.bhag...@mpi-bn.mpg.de> wrote:
> >
> > Hi Lori,
> >
> > I remember now - I tried this function earlier, but it does not work for my 
> > bedfiles, like the one in attach.
> >
> > > bedfile      <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > >
> > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > > 'mm10')
> > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
> > dec,  : scan() expected 'an integer', got 'chr2'
> > >
> >
> > Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> > error?
> >
> > many tools and organizations have extended BED with additional columns. 
> > These are not officially valid BED files, and as such rtracklayer does not 
> > yet support them (this will be addressed soon).
> >
> > Which brings the question: how soon is soon :-D ?
> >
> > Aditya
> >
> >
> > ________________________________
> > From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> > Sent: Tuesday, September 17, 2019 1:02 PM
> > To: Bhagwat, Aditya; bioc-devel@r-project.org
> > Subject: Re: read_bed()
> >
> > Please look at rtracklayer::import()  function that we recommend for 
> > reading of BAM files along with other common formats.
> >
> > Cheers,
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Cancer Institute
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > ________________________________
> > From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Bhagwat, 
> > Aditya <aditya.bhag...@mpi-bn.mpg.de>
> > Sent: Tuesday, September 17, 2019 6:58 AM
> > To: bioc-devel@r-project.org <bioc-devel@r-project.org>
> > Subject: [Bioc-devel] read_bed()
> >
> > Dear bioc-devel,
> >
> > I had two feedback requests regarding the function read_bed().
> >
> > 1) Did I overlook, and therefore, re-invent existing functionality?
> > 2) If not, would `read_bed` be suited for existence in a more foundational 
> > package, e.g. `GenomicRanges`, given the rather basal nature of this 
> > functionality?
> >
> > It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> > (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds 
> > BSgenome info (to allow for implicit range validity 
> > checking<https://support.bioconductor.org/p/124250>) and plots the 
> > karyogram<https://support.bioconductor.org/p/124328>.
> >
> > Thank you for your feedback.
> >
> > Cheers,
> >
> > Aditya
> >
> >
> > #' Read bedfile into GRanges
> > #'
> > #' @param bedfile        file path
> > #' @param bsgenome       BSgenome, e.g. 
> > BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> > #' @param zero_based     logical(1): whether bedfile GRanges are 0-based
> > #' @param rm_duplicates  logical(1)
> > #' @param plot           logical(1)
> > #' @param verbose        logical(1)
> > #' @return \code{\link[GenomicRanges]{GRanges-class}}
> > #' @note By convention BED files are 0-based. GRanges are always 1-based.
> > #'       A good discussion on these two alternative codings is given
> > #'       by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
> > #' @examples
> > #' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> > #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> > #' (gr <- read_bed(bedfile, bsgenome))
> > #' @importFrom  data.table  :=
> > #' @export
> > read_bed <- function(
> >     bedfile,
> >     bsgenome,
> >     zero_based    = TRUE,
> >     rm_duplicates = TRUE,
> >     plot          = TRUE,
> >     verbose       = TRUE
> > ){
> >     # Assert
> >     assert_all_are_existing_files(bedfile)
> >     assert_is_a_bool(verbose)
> >     assert_is_a_bool(rm_duplicates)
> >     assert_is_a_bool(zero_based)
> >
> >     # Comply
> >     seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL
> >
> >     # Read
> >     if (verbose) cmessage('\tRead %s', bedfile)
> >     dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
> >             col.names = c('seqnames', 'start', 'end', 'strand'))
> >     data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))
> >
> >     # Transform coordinates: 0-based -> 1-based
> >     if (zero_based){
> >         if (verbose)    cmessage('\t\tConvert 0 -> 1-based')
> >         dt[, start := start + 1]
> >     }
> >
> >     if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
> >                             nrow(dt), length(unique(dt$seqnames)))
> >
> >     # Drop duplicates
> >     if (rm_duplicates){
> >         is_duplicated <- cduplicated(dt)
> >         if (any(is_duplicated)){
> >             if (verbose) cmessage('\t\t        %d after removing 
> > duplicates')
> >             dt %<>% extract(!duplicated)
> >         }
> >     }
> >
> >     # Turn into GRanges
> >     gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)
> >
> >     # Plot and return
> >     title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
> >     if (plot) plot_karyogram(gr, title)
> >     gr
> > }
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> > This email message may contain legally privileged and/or confidential 
> > information. If you are not the intended recipient(s), or the employee or 
> > agent responsible for the delivery of this message to the intended 
> > recipient(s), you are hereby notified that any disclosure, copying, 
> > distribution, or use of this email message is prohibited. If you have 
> > received this message in error, please notify the sender immediately by 
> > e-mail and delete this email message from your computer. Thank you.
> > _______________________________________________
> > Bioc-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> --
> Michael Lawrence
> Scientist, Bioinformatics and Computational Biology
> Genentech, A Member of the Roche Group
> Office +1 (650) 225-7760
> micha...@gene.com
>
> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube



--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to