I found this useful R wrapper of bigWigSummary by Malcom Cook on the mailing list :
Here is one approach that has worked for me: > > download it as a wig file > > convert to bigWig format using wigToBigWig (c.f. > http://genomewiki.ucsc.edu/index.php/Kent_source_utilities and/or > http://hgdownload.cse.ucsc.edu/admin/exe/) > > extract regions of interest from the bigWig version using bigWigSummary > > if you're doing analysis in R, then the following function might help > > bigWigSummaryScore = function(path,chr,start,end) { > ## PURPOSE: proviude fast indexed retrieval of vector of scores from > bigWig > ## file <path> (such as may be used to store phastCons scores). start and > ## end are 1-based (comporting with GRanges) > ## > ## Implemented as wrapper of Jim Kent's bigWigSummary. Special care > ## is taken to return NA for all ranges lacking data (as is NOT done > ## by the command line tool). > ## > ## "when you're giving bigWigSummary coordinate input on the command > ## line, it's expecting it in our bed format, which is zero based." > ## per: > ## https://lists.soe.ucsc.edu/pipermail/genome/2011-February/025099.html > ## > call <- > sprintf('bigWigSummary %s %s %s %s > %s',path,chr,start-1,end,end-start+1); > result=suppressWarnings( > ## thus supressing, eg, "no data in region chrU:10047341-10047388 in > dm3_phastCons15way.bw" > lapply(strsplit(system(call, > intern=TRUE, > ignore.stdout=FALSE, > ignore.stderr=TRUE)[1], > "\t"),as.numeric)[[1]]) > if(length(result)==1 && is.na(result[[1]])) { > result=rep(NA,end-start+1) > } > result; > > > ~Malcolm > > > That works well for short strings but one has to be careful because the function system() will split the ouput if it is greater than 8095 bytes, and the function above will return only the first one and half thousands values without returning a warning. I wrote a correction to that. I also included the "dataPoints" optional argument, and a wrapper from GRanges. Best, Arnaud ps : I am not yet on the mailing list so I will receive only messages to my mailbox. bigWigSummaryScore <- function(path, chr, start, end, dataPoints = end-start+1) { ## R Wrapper from Malcom Cook : https://lists.soe.ucsc.edu/pipermail/genome/2011-May/025999.html ## Adapted by Arnaud Amzallag ## PURPOSE: provide fast indexed retrieval of vector of scores from bigWig ## file <path> (such as may be used to store phastCons scores). start and ## end are 1-based (comporting with GRanges) ## ## Implemented as wrapper of Jim Kent's bigWigSummary. Special care ## is taken to return NA for all ranges lacking data (as is NOT done ## by the command line tool). ## ## "when you're giving bigWigSummary coordinate input on the command ## line, it's expecting it in our bed format, which is zero based." ## per: ## https://lists.soe.ucsc.edu/pipermail/genome/2011-February/025099.html ## call <- sprintf('bigWigSummary %s %s %s %s %s',path, chr, start-1, end, dataPoints); sdin <- suppressWarnings(system(call, intern=TRUE, ignore.stdout=FALSE, ignore.stderr=TRUE)) ## thus supressing, eg, "no data in region chrU:10047341-10047388 in dm3_phastCons15way.bw" sdin <- do.call(paste, c(as.list(sdin), sep="")) result <- lapply(strsplit(sdin, "\t"),as.numeric)[[1]] ## If you want NA's, and not zeros uncomment the three next lines ; ## if(length(result)==1 && is.na(result[[1]])) { ## result=rep(NA,end-start+1) ## } result; } getwig <- function(path, gr, ...) { if (length(gr) >1 | class(gr)!="GRanges") stop("getwig: gr must be a GRanges object of length one") bigWigSummaryScore(path, seqnames(gr), start(gr), end(gr), ...) } _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
