Hi Arnaud, Thank you for sharing :)
Katrina Learned UCSC Genome Bioinformatics Group On 9/2/11 7:40 AM, arnaud millehuit wrote: > I found this useful R wrapper of bigWigSummary by Malcom Cook on the > mailing list : > > Here is one approach that has worked for me: >> download it as a wig file >> >> convert to bigWig format using wigToBigWig (c.f. >> http://genomewiki.ucsc.edu/index.php/Kent_source_utilities and/or >> http://hgdownload.cse.ucsc.edu/admin/exe/) >> >> extract regions of interest from the bigWig version using bigWigSummary >> >> if you're doing analysis in R, then the following function might help >> >> bigWigSummaryScore = function(path,chr,start,end) { >> ## PURPOSE: proviude fast indexed retrieval of vector of scores from >> bigWig >> ## file<path> (such as may be used to store phastCons scores). start >> and >> ## end are 1-based (comporting with GRanges) >> ## >> ## Implemented as wrapper of Jim Kent's bigWigSummary. Special care >> ## is taken to return NA for all ranges lacking data (as is NOT done >> ## by the command line tool). >> ## >> ## "when you're giving bigWigSummary coordinate input on the command >> ## line, it's expecting it in our bed format, which is zero based." >> ## per: >> ## https://lists.soe.ucsc.edu/pipermail/genome/2011-February/025099.html >> ## >> call<- >> sprintf('bigWigSummary %s %s %s %s >> %s',path,chr,start-1,end,end-start+1); >> result=suppressWarnings( >> ## thus supressing, eg, "no data in region chrU:10047341-10047388 in >> dm3_phastCons15way.bw" >> lapply(strsplit(system(call, >> intern=TRUE, >> ignore.stdout=FALSE, >> ignore.stderr=TRUE)[1], >> "\t"),as.numeric)[[1]]) >> if(length(result)==1&& is.na(result[[1]])) { >> result=rep(NA,end-start+1) >> } >> result; >> >> >> ~Malcolm >> >> >> > That works well for short strings but one has to be careful because the > function system() will split the ouput if it is greater than 8095 bytes, and > the function above will return only the first one and half thousands values > without returning a warning. > > I wrote a correction to that. I also included the "dataPoints" optional > argument, and a wrapper from GRanges. > > Best, > > Arnaud > ps : I am not yet on the mailing list so I will receive only messages to my > mailbox. > > > bigWigSummaryScore<- function(path, chr, start, end, dataPoints = > end-start+1) { > ## R Wrapper from Malcom Cook : > https://lists.soe.ucsc.edu/pipermail/genome/2011-May/025999.html > ## Adapted by Arnaud Amzallag > ## PURPOSE: provide fast indexed retrieval of vector of scores from > bigWig > ## file<path> (such as may be used to store phastCons scores). start > and > ## end are 1-based (comporting with GRanges) > ## > ## Implemented as wrapper of Jim Kent's bigWigSummary. Special care > ## is taken to return NA for all ranges lacking data (as is NOT done > ## by the command line tool). > ## > ## "when you're giving bigWigSummary coordinate input on the command > ## line, it's expecting it in our bed format, which is zero based." > ## per: > ## https://lists.soe.ucsc.edu/pipermail/genome/2011-February/025099.html > ## > call<- sprintf('bigWigSummary %s %s %s %s %s',path, chr, start-1, end, > dataPoints); > sdin<- suppressWarnings(system(call, intern=TRUE, ignore.stdout=FALSE, > ignore.stderr=TRUE)) > ## thus supressing, eg, "no data in region chrU:10047341-10047388 in > dm3_phastCons15way.bw" > sdin<- do.call(paste, c(as.list(sdin), sep="")) > result<- lapply(strsplit(sdin, > "\t"),as.numeric)[[1]] > ## If you want NA's, and not zeros uncomment the three next lines ; > ## if(length(result)==1&& is.na(result[[1]])) { > ## result=rep(NA,end-start+1) > ## } > result; > } > > > getwig<- function(path, gr, ...) { > if (length(gr)>1 | class(gr)!="GRanges") stop("getwig: gr must be a > GRanges object of length one") > bigWigSummaryScore(path, seqnames(gr), start(gr), end(gr), ...) > } > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
