Hi Arnaud,

Thank you for sharing :)

Katrina Learned
UCSC Genome Bioinformatics Group


On 9/2/11 7:40 AM, arnaud millehuit wrote:
>   I found this useful R wrapper of bigWigSummary by Malcom Cook on the
> mailing list :
>
> Here is one approach that has worked for me:
>> download it as a wig file
>>
>> convert to bigWig format using wigToBigWig  (c.f. 
>> http://genomewiki.ucsc.edu/index.php/Kent_source_utilities and/or 
>> http://hgdownload.cse.ucsc.edu/admin/exe/)
>>
>> extract regions of interest from the bigWig version using bigWigSummary
>>
>> if you're doing analysis in R, then the following function might help
>>
>> bigWigSummaryScore = function(path,chr,start,end) {
>>      ## PURPOSE: proviude fast indexed retrieval of vector of scores from 
>> bigWig
>>      ## file<path>  (such as may be used to store phastCons scores).  start 
>> and
>>      ## end are 1-based (comporting with GRanges)
>>      ##
>>      ## Implemented as wrapper of Jim Kent's bigWigSummary.  Special care
>>      ## is taken to return NA for all ranges lacking data (as is NOT done
>>      ## by the command line tool).
>>      ##
>>      ## "when you're giving bigWigSummary coordinate input on the command
>>      ## line, it's expecting it in our bed format, which is zero based."
>>      ## per:
>>      ## https://lists.soe.ucsc.edu/pipermail/genome/2011-February/025099.html
>>      ##
>>      call<-
>>        sprintf('bigWigSummary %s %s %s %s 
>> %s',path,chr,start-1,end,end-start+1);
>>      result=suppressWarnings(
>>        ## thus supressing, eg, "no data in region chrU:10047341-10047388 in 
>> dm3_phastCons15way.bw"
>>        lapply(strsplit(system(call,
>>                               intern=TRUE,
>>                               ignore.stdout=FALSE,
>>                               ignore.stderr=TRUE)[1],
>>                        "\t"),as.numeric)[[1]])
>>      if(length(result)==1&&  is.na(result[[1]])) {
>>        result=rep(NA,end-start+1)
>>      }
>>      result;
>>
>>
>> ~Malcolm
>>
>>
>>
> That works well for short strings but one has to be careful because the
> function system() will split the ouput if it is greater than 8095 bytes, and
> the function above will return only the first one and half thousands values
> without returning a warning.
>
> I wrote a correction to that. I also included the "dataPoints" optional
> argument, and a wrapper from GRanges.
>
> Best,
>
> Arnaud
> ps : I am not yet on the mailing list so I will receive only messages to my
> mailbox.
>
>
> bigWigSummaryScore<- function(path, chr, start, end, dataPoints =
> end-start+1) {
>      ## R Wrapper from Malcom Cook :
> https://lists.soe.ucsc.edu/pipermail/genome/2011-May/025999.html
>      ## Adapted by Arnaud Amzallag
>      ## PURPOSE: provide fast indexed retrieval of vector of scores from
> bigWig
>      ## file<path>  (such as may be used to store phastCons scores).  start
> and
>      ## end are 1-based (comporting with GRanges)
>      ##
>      ## Implemented as wrapper of Jim Kent's bigWigSummary.  Special care
>      ## is taken to return NA for all ranges lacking data (as is NOT done
>      ## by the command line tool).
>      ##
>      ## "when you're giving bigWigSummary coordinate input on the command
>      ## line, it's expecting it in our bed format, which is zero based."
>      ## per:
>      ## https://lists.soe.ucsc.edu/pipermail/genome/2011-February/025099.html
>      ##
>      call<- sprintf('bigWigSummary %s %s %s %s %s',path, chr, start-1, end,
> dataPoints);
>      sdin<- suppressWarnings(system(call, intern=TRUE, ignore.stdout=FALSE,
> ignore.stderr=TRUE))
>      ## thus supressing, eg, "no data in region chrU:10047341-10047388 in
> dm3_phastCons15way.bw"
>      sdin<- do.call(paste, c(as.list(sdin), sep=""))
>      result<- lapply(strsplit(sdin,
>                        "\t"),as.numeric)[[1]]
>      ## If you want NA's, and not zeros uncomment the three next lines ;
>      ## if(length(result)==1&&  is.na(result[[1]])) {
>      ## result=rep(NA,end-start+1)
>      ## }
>      result;
>    }
>
>
> getwig<- function(path, gr, ...) {
>    if (length(gr)>1 | class(gr)!="GRanges") stop("getwig: gr must be a
> GRanges object of length one")
>    bigWigSummaryScore(path, seqnames(gr), start(gr), end(gr), ...)
> }
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to