Hi Michael, Yeah, I also noticed that the attachment was eaten when it entered the bio-devel list.
The file is also accessible in the extdata of the multicrispr: https://gitlab.gwdg.de/loosolab/software/multicrispr/blob/master/inst/extdata/SRF.bed A bedfile to GRanges importer requires columns 1 (chrom), 2 (chromStart), 3 (chromEnd), and column 6 (strand). All of these are present in SRF.bed. I am curious as to why you feel that having additional columns in a bedfile would break it? Cheers, Aditya ________________________________________ From: Michael Lawrence [lawrence.mich...@gene.com] Sent: Tuesday, September 17, 2019 1:41 PM To: Bhagwat, Aditya Cc: Shepherd, Lori; bioc-devel@r-project.org Subject: Re: [Bioc-devel] read_bed() I don't see an attachment, nor can I find the multicrispr package anywhere. The "addressed soon" was referring to the BEDX+Y formats, which was addressed many years ago, so I've updated the documentation. Broken BED files will never be supported. Michael On Tue, Sep 17, 2019 at 4:17 AM Bhagwat, Aditya <aditya.bhag...@mpi-bn.mpg.de> wrote: > > Hi Lori, > > I remember now - I tried this function earlier, but it does not work for my > bedfiles, like the one in attach. > > > bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr') > > > > targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = > > 'mm10') > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, > : scan() expected 'an integer', got 'chr2' > > > > Perhaps this sentence in `?rtracklayer::import` points to the source of the > error? > > many tools and organizations have extended BED with additional columns. These > are not officially valid BED files, and as such rtracklayer does not yet > support them (this will be addressed soon). > > Which brings the question: how soon is soon :-D ? > > Aditya > > > ________________________________ > From: Shepherd, Lori [lori.sheph...@roswellpark.org] > Sent: Tuesday, September 17, 2019 1:02 PM > To: Bhagwat, Aditya; bioc-devel@r-project.org > Subject: Re: read_bed() > > Please look at rtracklayer::import() function that we recommend for reading > of BAM files along with other common formats. > > Cheers, > > > Lori Shepherd > > Bioconductor Core Team > > Roswell Park Cancer Institute > > Department of Biostatistics & Bioinformatics > > Elm & Carlton Streets > > Buffalo, New York 14263 > > ________________________________ > From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Bhagwat, > Aditya <aditya.bhag...@mpi-bn.mpg.de> > Sent: Tuesday, September 17, 2019 6:58 AM > To: bioc-devel@r-project.org <bioc-devel@r-project.org> > Subject: [Bioc-devel] read_bed() > > Dear bioc-devel, > > I had two feedback requests regarding the function read_bed(). > > 1) Did I overlook, and therefore, re-invent existing functionality? > 2) If not, would `read_bed` be suited for existence in a more foundational > package, e.g. `GenomicRanges`, given the rather basal nature of this > functionality? > > It reads a bedfile into a GRanges, converts the coordinates from 0-based > (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds > BSgenome info (to allow for implicit range validity > checking<https://support.bioconductor.org/p/124250>) and plots the > karyogram<https://support.bioconductor.org/p/124328>. > > Thank you for your feedback. > > Cheers, > > Aditya > > > #' Read bedfile into GRanges > #' > #' @param bedfile file path > #' @param bsgenome BSgenome, e.g. > BSgenome.Mmusculus.UCSC.mm10::Mmusculus > #' @param zero_based logical(1): whether bedfile GRanges are 0-based > #' @param rm_duplicates logical(1) > #' @param plot logical(1) > #' @param verbose logical(1) > #' @return \code{\link[GenomicRanges]{GRanges-class}} > #' @note By convention BED files are 0-based. GRanges are always 1-based. > #' A good discussion on these two alternative codings is given > #' by Obi Griffith on Biostars: https://www.biostars.org/p/84686/ > #' @examples > #' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr') > #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus > #' (gr <- read_bed(bedfile, bsgenome)) > #' @importFrom data.table := > #' @export > read_bed <- function( > bedfile, > bsgenome, > zero_based = TRUE, > rm_duplicates = TRUE, > plot = TRUE, > verbose = TRUE > ){ > # Assert > assert_all_are_existing_files(bedfile) > assert_is_a_bool(verbose) > assert_is_a_bool(rm_duplicates) > assert_is_a_bool(zero_based) > > # Comply > seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL > > # Read > if (verbose) cmessage('\tRead %s', bedfile) > dt <- data.table::fread(bedfile, select = c(seq_len(3), 6), > col.names = c('seqnames', 'start', 'end', 'strand')) > data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand')) > > # Transform coordinates: 0-based -> 1-based > if (zero_based){ > if (verbose) cmessage('\t\tConvert 0 -> 1-based') > dt[, start := start + 1] > } > > if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes', > nrow(dt), length(unique(dt$seqnames))) > > # Drop duplicates > if (rm_duplicates){ > is_duplicated <- cduplicated(dt) > if (any(is_duplicated)){ > if (verbose) cmessage('\t\t %d after removing duplicates') > dt %<>% extract(!duplicated) > } > } > > # Turn into GRanges > gr <- add_seqinfo(as(dt, 'GRanges'), bsgenome) > > # Plot and return > title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile)) > if (plot) plot_karyogram(gr, title) > gr > } > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel -- Michael Lawrence Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 micha...@gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel