Re: [Genome] question about phyoP conservation data

Cook, Malcolm Fri, 20 May 2011 13:51:17 -0700

Here is one approach that has worked for me:

download it as a wig file


convert to bigWig format using wigToBigWig  (c.f. 
http://genomewiki.ucsc.edu/index.php/Kent_source_utilities and/or 
http://hgdownload.cse.ucsc.edu/admin/exe/)

extract regions of interest from the bigWig version using bigWigSummary

if you're doing analysis in R, then the following function might help

bigWigSummaryScore = function(path,chr,start,end) {
    ## PURPOSE: proviude fast indexed retrieval of vector of scores from bigWig
    ## file <path> (such as may be used to store phastCons scores).  start and
    ## end are 1-based (comporting with GRanges)
    ##
    ## Implemented as wrapper of Jim Kent's bigWigSummary.  Special care
    ## is taken to return NA for all ranges lacking data (as is NOT done
    ## by the command line tool).
    ##
    ## "when you're giving bigWigSummary coordinate input on the command
    ## line, it's expecting it in our bed format, which is zero based."
    ## per:
    ## https://lists.soe.ucsc.edu/pipermail/genome/2011-February/025099.html
    ##
    call <-
      sprintf('bigWigSummary %s %s %s %s %s',path,chr,start-1,end,end-start+1);
    result=suppressWarnings(
      ## thus supressing, eg, "no data in region chrU:10047341-10047388 in 
dm3_phastCons15way.bw"
      lapply(strsplit(system(call,
                             intern=TRUE,
                             ignore.stdout=FALSE,
                             ignore.stderr=TRUE)[1],
                      "\t"),as.numeric)[[1]])
    if(length(result)==1 && is.na(result[[1]])) {
      result=rep(NA,end-start+1)
    }
    result; 
  

~Malcolm


> -----Original Message-----
> From: [email protected] [mailto:genome-
> [email protected]] On Behalf Of PKDK
> Sent: Friday, May 20, 2011 3:15 PM
> To: [email protected]
> Subject: [Genome] question about phyoP conservation data
> 
> Hello
> 
> I need to find a way to extract the phyloP conservation data for
> specific areas of the chromosome. I have over 3000 specific areas that
> I need to look at and extract the phyloP values for them.  I thought
> that I could just download the mass data file but it seems like the
> file does not take into account gaps or areas on the chromosome where
> the phyloP data is not given.
> 
> For example if I wanted the following phyloP for the query :
> chr1:1220332130-1220332150
> 
> On the large phyloP file the data for chr1, it starts at 10918.  If
> there are any gaps between 10918 and 1220332130, then the index to the
> nucleotide number will be thrown off.  Is there any way to compensate
> for gaps?
> 
> Or is there a better way to extract phyloP data?
> 
> Thanks in advance
> 
> Dave
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Re: [Genome] question about phyoP conservation data

Reply via email to