Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational 
package, e.g. `GenomicRanges`, given the rather basal nature of this 
functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based 
(bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome 
info (to allow for implicit range validity 
checking<https://support.bioconductor.org/p/124250>) and plots the 
karyogram<https://support.bioconductor.org/p/124328>.

Thank you for your feedback.

Cheers,

Aditya


#' Read bedfile into GRanges
#'
#' @param bedfile        file path
#' @param bsgenome       BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based     logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot           logical(1)
#' @param verbose        logical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'       A good discussion on these two alternative codings is given
#'       by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
    bedfile,
    bsgenome,
    zero_based    = TRUE,
    rm_duplicates = TRUE,
    plot          = TRUE,
    verbose       = TRUE
){
    # Assert
    assert_all_are_existing_files(bedfile)
    assert_is_a_bool(verbose)
    assert_is_a_bool(rm_duplicates)
    assert_is_a_bool(zero_based)

    # Comply
    seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

    # Read
    if (verbose) cmessage('\tRead %s', bedfile)
    dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
            col.names = c('seqnames', 'start', 'end', 'strand'))
    data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

    # Transform coordinates: 0-based -> 1-based
    if (zero_based){
        if (verbose)    cmessage('\t\tConvert 0 -> 1-based')
        dt[, start := start + 1]
    }

    if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
                            nrow(dt), length(unique(dt$seqnames)))

    # Drop duplicates
    if (rm_duplicates){
        is_duplicated <- cduplicated(dt)
        if (any(is_duplicated)){
            if (verbose) cmessage('\t\t        %d after removing duplicates')
            dt %<>% extract(!duplicated)
        }
    }

    # Turn into GRanges
    gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

    # Plot and return
    title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
    if (plot) plot_karyogram(gr, title)
    gr
}


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to