Hi Lori,

I remember now - I tried this function earlier, but it does not work for my 
bedfiles, like the one in attach.

> bedfile      <- system.file('extdata/SRF.bed', package = 'multicrispr')
>
> targetranges <- rtracklayer::import(bedfile, format = 'BED', genome = 'mm10')
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
scan() expected 'an integer', got 'chr2'
>

Perhaps this sentence in `?rtracklayer::import` points to the source of the 
error?

many tools and organizations have extended BED with additional columns. These 
are not officially valid BED files, and as such rtracklayer does not yet 
support them (this will be addressed soon).

Which brings the question: how soon is soon :-D ?

Aditya


________________________________
From: Shepherd, Lori [lori.sheph...@roswellpark.org]
Sent: Tuesday, September 17, 2019 1:02 PM
To: Bhagwat, Aditya; bioc-devel@r-project.org
Subject: Re: read_bed()

Please look at rtracklayer::import()  function that we recommend for reading of 
BAM files along with other common formats.

Cheers,


Lori Shepherd

Bioconductor Core Team

Roswell Park Cancer Institute

Department of Biostatistics & Bioinformatics

Elm & Carlton Streets

Buffalo, New York 14263

________________________________
From: Bioc-devel <bioc-devel-boun...@r-project.org> on behalf of Bhagwat, 
Aditya <aditya.bhag...@mpi-bn.mpg.de>
Sent: Tuesday, September 17, 2019 6:58 AM
To: bioc-devel@r-project.org <bioc-devel@r-project.org>
Subject: [Bioc-devel] read_bed()

Dear bioc-devel,

I had two feedback requests regarding the function read_bed().

1) Did I overlook, and therefore, re-invent existing functionality?
2) If not, would `read_bed` be suited for existence in a more foundational 
package, e.g. `GenomicRanges`, given the rather basal nature of this 
functionality?

It reads a bedfile into a GRanges, converts the coordinates from 0-based 
(bedfile) to 1-based (GRanges)<https://www.biostars.org/p/84686>, adds BSgenome 
info (to allow for implicit range validity 
checking<https://support.bioconductor.org/p/124250>) and plots the 
karyogram<https://support.bioconductor.org/p/124328>.

Thank you for your feedback.

Cheers,

Aditya


#' Read bedfile into GRanges
#'
#' @param bedfile        file path
#' @param bsgenome       BSgenome, e.g. BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' @param zero_based     logical(1): whether bedfile GRanges are 0-based
#' @param rm_duplicates  logical(1)
#' @param plot           logical(1)
#' @param verbose        logical(1)
#' @return \code{\link[GenomicRanges]{GRanges-class}}
#' @note By convention BED files are 0-based. GRanges are always 1-based.
#'       A good discussion on these two alternative codings is given
#'       by Obi Griffith on Biostars: https://www.biostars.org/p/84686/
#' @examples
#' bedfile  <- system.file('extdata/SRF.bed', package = 'multicrispr')
#' bsgenome <- BSgenome.Mmusculus.UCSC.mm10::Mmusculus
#' (gr <- read_bed(bedfile, bsgenome))
#' @importFrom  data.table  :=
#' @export
read_bed <- function(
    bedfile,
    bsgenome,
    zero_based    = TRUE,
    rm_duplicates = TRUE,
    plot          = TRUE,
    verbose       = TRUE
){
    # Assert
    assert_all_are_existing_files(bedfile)
    assert_is_a_bool(verbose)
    assert_is_a_bool(rm_duplicates)
    assert_is_a_bool(zero_based)

    # Comply
    seqnames <- start <- end <- strand <- .N <- gap <- width <- NULL

    # Read
    if (verbose) cmessage('\tRead %s', bedfile)
    dt <- data.table::fread(bedfile, select = c(seq_len(3), 6),
            col.names = c('seqnames', 'start', 'end', 'strand'))
    data.table::setorderv(dt, c('seqnames', 'start', 'end', 'strand'))

    # Transform coordinates: 0-based -> 1-based
    if (zero_based){
        if (verbose)    cmessage('\t\tConvert 0 -> 1-based')
        dt[, start := start + 1]
    }

    if (verbose) cmessage('\t\tRanges: %d ranges on %d chromosomes',
                            nrow(dt), length(unique(dt$seqnames)))

    # Drop duplicates
    if (rm_duplicates){
        is_duplicated <- cduplicated(dt)
        if (any(is_duplicated)){
            if (verbose) cmessage('\t\t        %d after removing duplicates')
            dt %<>% extract(!duplicated)
        }
    }

    # Turn into GRanges
    gr <-  add_seqinfo(as(dt, 'GRanges'), bsgenome)

    # Plot and return
    title <- paste0(providerVersion(bsgenome), ': ', basename(bedfile))
    if (plot) plot_karyogram(gr, title)
    gr
}


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information. If you are not the intended recipient(s), or the employee or agent 
responsible for the delivery of this message to the intended recipient(s), you 
are hereby notified that any disclosure, copying, distribution, or use of this 
email message is prohibited. If you have received this message in error, please 
notify the sender immediately by e-mail and delete this email message from your 
computer. Thank you.
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to