Hi Michael,

Yeah, I also noticed that the attachment was eaten when it entered the 
bio-devel list. 

The file is also accessible in the extdata of the multicrispr:
https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed

A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 
(chromEnd), and column 6 (strand). All of these are present in SRF.bed.

I am curious as to why you feel that having additional columns in a bedfile 
would break it?

Cheers,

Aditya

________________________________________
From: Michael Lawrence [lawrence.mich...@gene.com]
Sent: Tuesday, September 17, 2019 1:41 PM
To: Bhagwat, Aditya
Cc: Shepherd, Lori; bioc-devel@r-project.org
Subject: Re: [Bioc-devel] read_bed()

I don't see an attachment, nor can I find the multicrispr package
anywhere. The "addressed soon" was referring to the BEDX+Y formats,
which was addressed many years ago, so I've updated the documentation.
Broken BED files will never be supported.

Michael

On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya
<aditya.bhag...@mpi-bn.mpg.de> wrote:
>
> Hi Lori,
>
> I remember now - I tried this function earlier, but it does not work for my 
> bedfiles, like the one in attach.
>
> > bedfile      <- system.file('extdata/SRF.bed', package = 'multicrispr')
> >
> > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 
> > 'mm10')
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  
> : scan() expected 'an integer', got 'chr2'
> >
>
> Perhaps this sentence in `?rtracklayer::import` points to the source of the 
> error?
>
> many tools and organizations have extended BED with additional columns. These 
> are not officially valid BED files, and as such rtracklayer does not yet 
> support them (this will be addressed soon).
>
> Which brings the question: how soon is soon :-D ?
>
> Aditya
>
>
> ________________________________
> From: Shepherd, Lori [lori.sheph...@roswellpark.org]
> Sent: Tuesday, September 17, 2019 1:02 PM
> To: Bhagwat, Aditya; bioc-devel@r-project.org
> Subject: Re: read_bed()
>
> Please look at rtracklayer::import()  function that we recommend for reading 
> of BAM files along with other common formats.
>
> Cheers,
>
>
> Lori Shepherd
>
> Bioconductor Core Team
>
> Roswell Park Cancer Institute
>
> Department of Biostatistics & Bioinformatics
>
> Elm & Carlton Streets
>
> Buffalo, New York 14263
>
> ________________________________
> From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Bhagwat, 
> Aditya <aditya.bhag...@mpi-bn.mpg.de>
> Sent: Tuesday, September 17, 2019 6:58 AM
> To: bioc-devel@r-project.org <bioc-devel@r-project.org>
> Subject: [Bioc-devel] read_bed()
>
> Dear bioc-devel,
>
> I had two feedback requests regarding the function read_bed().
>
> 1) Did I overlook, and therefore, re-invent existing functionality?
> 2) If not, would `read_bed` be suited for existence in a more foundational 
> package, e.g. `GenomicRanges`, given the rather basal nature of this 
> functionality?
>
> It reads a bedfile into a GRanges, converts the coordinates from 0-based 
> (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds 
> BSgenome info (to allow for implicit range validity 
> checking<https://support.bioconductor.org/p/124250>) and plots the 
> karyogram<https://support.bioconductor.org/p/124328>.
>
> Thank you for your feedback.
>
> Cheers,
>
> Aditya
>
>
> #' Read bedfile into GRanges
> #'
> #' @param bedfile        file path
> #' @param bsgenome       BSgenome, e.g. 
> BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' @param zero_based     logical(1): whether bedfile GRanges are 0-based
> #' @param rm_duplicates  logical(1)
> #' @param plot           logical(1)
> #' @param verbose        logical(1)
> #' @return \code{\link[GenomicRanges]{GRanges-class}}
> #' @note By convention BED files are 0-based. GRanges are always 1-based.
> #'       A good discussion on these two alternative codings is given
> #'       by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
> #' @examples
> #' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
> #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
> #' (gr <- read_bed(bedfile, bsgenome))
> #' @importFrom  data.table  :=
> #' @export
> read_bed <- function(
>     bedfile,
>     bsgenome,
>     zero_based    = TRUE,
>     rm_duplicates = TRUE,
>     plot          = TRUE,
>     verbose       = TRUE
> ){
>     # Assert
>     assert_all_are_existing_files(bedfile)
>     assert_is_a_bool(verbose)
>     assert_is_a_bool(rm_duplicates)
>     assert_is_a_bool(zero_based)
>
>     # Comply
>     seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL
>
>     # Read
>     if (verbose) cmessage('\tRead %s', bedfile)
>     dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
>             col.names = c('seqnames', 'start', 'end', 'strand'))
>     data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))
>
>     # Transform coordinates: 0-based -> 1-based
>     if (zero_based){
>         if (verbose)    cmessage('\t\tConvert 0 -> 1-based')
>         dt[, start := start + 1]
>     }
>
>     if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
>                             nrow(dt), length(unique(dt$seqnames)))
>
>     # Drop duplicates
>     if (rm_duplicates){
>         is_duplicated <- cduplicated(dt)
>         if (any(is_duplicated)){
>             if (verbose) cmessage('\t\t        %d after removing duplicates')
>             dt %<>% extract(!duplicated)
>         }
>     }
>
>     # Turn into GRanges
>     gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)
>
>     # Plot and return
>     title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
>     if (plot) plot_karyogram(gr, title)
>     gr
> }
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> This email message may contain legally privileged and/or confidential 
> information. If you are not the intended recipient(s), or the employee or 
> agent responsible for the delivery of this message to the intended 
> recipient(s), you are hereby notified that any disclosure, copying, 
> distribution, or use of this email message is prohibited. If you have 
> received this message in error, please notify the sender immediately by 
> e-mail and delete this email message from your computer. Thank you.
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
micha...@gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to