Dear bioc-devel, I had two feedback requests regarding the function read_bed().
1) Did I overlook, and therefore, re-invent existing functionality? 2) If not, would `read_bed` be suited for existence in a more foundational package, e.g. `GenomicRanges`, given the rather basal nature of this functionality? It reads a bedfile into a GRanges, converts the coordinates from 0-based (bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome info (to allow for implicit range validity checking<https://support.bioconductor.org/p/124250>) and plots the karyogram<https://support.bioconductor.org/p/124328>. Thank you for your feedback. Cheers, Aditya #' Read bedfile into GRanges #' #' @param bedfile file path #' @param bsgenome BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus #' @param zero_based logical(1): whether bedfile GRanges are 0-based #' @param rm_duplicates logical(1) #' @param plot logical(1) #' @param verbose logical(1) #' @return \code{\link[GenomicRanges]{GRanges-class}} #' @note By convention BED files are 0-based. GRanges are always 1-based. #' A good discussion on these two alternative codings is given #' by Obi Griffith on Biostars: https://www.biostars.org/p/84686/ #' @examples #' bedfile <- system.file('extdata/SRF.bed', package = 'multicrispr') #' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus #' (gr <- read_bed(bedfile, bsgenome)) #' @importFrom data.table := #' @export read_bed <- function( bedfile, bsgenome, zero_based = TRUE, rm_duplicates = TRUE, plot = TRUE, verbose = TRUE ){ # Assert assert_all_are_existing_files(bedfile) assert_is_a_bool(verbose) assert_is_a_bool(rm_duplicates) assert_is_a_bool(zero_based) # Comply seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL # Read if (verbose) cmessage('\tRead %s', bedfile) dt <- data.table::fread(bedfile, select = c(seq_len(3), 6), col.names = c('seqnames', 'start', 'end', 'strand')) data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand')) # Transform coordinates: 0-based -> 1-based if (zero_based){ if (verbose) cmessage('\t\tConvert 0 -> 1-based') dt[, start := start + 1] } if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes', nrow(dt), length(unique(dt$seqnames))) # Drop duplicates if (rm_duplicates){ is_duplicated <- cduplicated(dt) if (any(is_duplicated)){ if (verbose) cmessage('\t\t %d after removing duplicates') dt %<>% extract(!duplicated) } } # Turn into GRanges gr <- add_seqinfo(as(dt, 'GRanges'), bsgenome) # Plot and return title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile)) if (plot) plot_karyogram(gr, title) gr } [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel