Aha - thx! Aditya ________________________________ From: Shepherd, Lori [lori.sheph...@roswellpark.org] Sent: Tuesday, September 17, 2019 1:02 PM To: Bhagwat, Aditya; bioc-devel@r-project.org Subject: Re: read_bed()
Please look at rtracklayer::import() function that we recommend for reading of BAM files along with other common formats. Cheers, Lori Shepherd Bioconductor Core Team Roswell Park Cancer Institute Department of Biostatistics & Bioinformatics Elm & Carlton Streets Buffalo, New York 14263 ________________________________ From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Bhagwat, Aditya <aditya.bhag...@mpi-bn.mpg.de> Sent: Tuesday, September 17, 2019 6:58 AM To: bioc-devel@r-project.org <bioc-devel@r-project.org> Subject: [Bioc-devel] read_bed() Dear bioc-devel, I had two feedback requests regarding the function read_bed(). 1) Did I overlook, and therefore, re-invent existing functionality? 2) If not, would `read_bed` be suited for existence in a more foundational package, e.g. `GenomicRanges`, given the rather basal nature of this functionality? It reads a bedfile into a GRanges, converts the coordinates from 0-based (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome info (to allow for implicit range validity checking<https://support.bioconductor.org/p/124250>) and plots the karyogram<https://support.bioconductor.org/p/124328>. Thank you for your feedback. Cheers, Aditya #' Read bedfile into GRanges #' #' @param bedfile file path #' @param bsgenome BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus #' @param zero_based logical(1): whether bedfile GRanges are 0-based #' @param rm_duplicates logical(1) #' @param plot logical(1) #' @param verbose logical(1) #' @return \code{\link[GenomicRanges]{GRanges-class}} #' @note By convention BED files are 0-based. GRanges are always 1-based. #' A good discussion on these two alternative codings is given #' by Obi Griffith on Biostars: https://www.biostars.org/p/84686/ #' @examples #' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr') #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus #' (gr <- read_bed(bedfile, bsgenome)) #' @importFrom data.table := #' @export read_bed <- function( bedfile, bsgenome, zero_based = TRUE, rm_duplicates = TRUE, plot = TRUE, verbose = TRUE ){ # Assert assert_all_are_existing_files(bedfile) assert_is_a_bool(verbose) assert_is_a_bool(rm_duplicates) assert_is_a_bool(zero_based) # Comply seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL # Read if (verbose) cmessage('\tRead %s', bedfile) dt <- data.table::fread(bedfile, select = c(seq_len(3), 6), col.names = c('seqnames', 'start', 'end', 'strand')) data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand')) # Transform coordinates: 0-based -> 1-based if (zero_based){ if (verbose) cmessage('\t\tConvert 0 -> 1-based') dt[, start := start + 1] } if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes', nrow(dt), length(unique(dt$seqnames))) # Drop duplicates if (rm_duplicates){ is_duplicated <- cduplicated(dt) if (any(is_duplicated)){ if (verbose) cmessage('\t\t %d after removing duplicates') dt %<>% extract(!duplicated) } } # Turn into GRanges gr <- add_seqinfo(as(dt, 'GRanges'), bsgenome) # Plot and return title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile)) if (plot) plot_karyogram(gr, title) gr } [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you. [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel