Hi,

This is probably a simple question, but I can't quite figure it out for myself. I've been using RangedData (etc) quite a lot and have often found myself looping through the spaces when I probably don't need to. I'm trying not to do that now...

I have some scores on a bunch of genomic intervals I'm interested in (they're conservation scores in 4kb promoter regions, e.g. phastCons scores). I've stored these as a SimpleRleList object.

I also have some smaller subregions (matches to transcription factor motifs) and I want to compare the distribution of scores inside those subregions to scores outside those subregions. I have the regions as RangedData objects, over multiple spaces. I want to do this on individual base pair scores, rather than by taking subregion means and comparing those (so I don't want to use viewSums).

I've managed to get the scores inside and outside the regions using the RleViewsList function (following the example in this thread: http://thread.gmane.org/gmane.comp.lang.r.sequencing/403/focus=408 ). The scores are now as SimpleRleViewsList.

I'm currently looping through the scores (the SimpleRleViewsList) space by space, converting to simple vector and concatenating the scores, but I imagine there's a way to do it without looping. Any ideas?

Below is a simple example (extended from the older thread). My real scores are not integers and there's many more of them.

#########################
library(IRanges)

### generate some arbitrary scores
track <- RangedData(RangesList(chrA = IRanges(start = c(1, 4, 6), width=c(3, 2, 4)),chrB = IRanges(start = c(1, 3, 6), width=c(3, 3, 4))) ) trackCoverage <- coverage(track, weight=list(chrA=c(2,7,3),chrB=c(1,1,1)) )

### define subregions
exons <- RangesList(chrA = IRanges(start = c(2, 4), width = c(2, 2)),chrB = IRanges(start = 3, width = 5))

###get exon and intron scores
exonView <- RleViewsList(rleList=trackCoverage, rangesList=exons)
intronView <- RleViewsList(rleList=trackCoverage, rangesList=gaps(ranges(RangedData(exons)),start=1,end=9))

### now convert those to simple integer objects. Loop through chromosomes. Can I do this without looping? ### (I tried using unlist, but it does something weird to the scores, as it assumes they're all from the same space)
exonScores <- integer()
intronScores <- integer()
for (chr in names(exonView) ) {
thischrexonscores <- as.integer(seqselect(trackCoverage[[chr]], start=start(exonView[[chr]]),end=end(exonView[[chr]]) ))
     exonScores <- c(exonScores,thischrexonscores)

thischrintronscores <- as.integer(seqselect(trackCoverage[[chr]], start=start(intronView[[chr]]),end=end(intronView[[chr]]) ))
     intronScores <- c(intronScores,thischrintronscores)

}

###and do something with the scores to compare distributions
wilcox.test(exonScores,intronScores)

#########################

thanks,

Janet

-------------------------------------------------------------------

Dr. Janet Young (Trask lab)

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung  ...at...  fhcrc.org

http://www.fhcrc.org/labs/trask/

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to