Firstly I'd like to thank the authors of the very useful package ChIPpeakAnno. 
I'd like to report a feature in ChIPpeakAnno annotatePeakInBatch function 
results that other users may or may not be aware of. I also propose 
improvements to compensate.

The resulting insideFeature column reports TRUE if the query peak is either 
contained within an annotated feature, and also reports TRUE if it overlaps the 
end of an annotated feature.

I think its worth noting that it reports FALSE if the peak overlaps the 
beginning of an annotated feature, and also reports FALSE if the peak overlaps 
in entirety an annotated feature(s).

So, perhaps the insideFeature column (or additional new column called 
overlappingFeature) could report five options: 
("false","inside","overlapStart","overlapEnd","super"). I haven't looked into 
the effects on how distanceToFeature should/could be called for each different 
scenario.

Apologies if this has already been addressed, or if others do not consider this 
useful.

Details with dummy example are described below.

Many thanks,
Amy.

#####

In the dummy example below, p1 is bigger than f1 and consequently p1 overlaps 
it in entirety. It would be nice if ChIPpeakAnno could report this - although I 
accept it may overlap more than one feature,
so would need to consider how to deal with that.

And another example from below, p3 in fact overlaps with the start of f3, but 
is called as insideFeature=FALSE. It would be nice if ChIPpeakAnno could report 
it as OverlapStart.

p4 is called as insideFeature = TRUE for overlapping with f4, but it would be 
nice if ChIPpeakAnno could report it as OverlapEnd or something similar.

And correctly p2 is called as insideFeature = TRUE for overlap with f2, in this 
case p2 ranges are within the f2 ranges as you would expect.


library(ChIPpeakAnno)
peaks = 
RangedData(IRanges(start=c(1543200,1557200,1563000,1569800,167889600),end=c(1555199,1560599,1565199,1573799,167893599),names=c("p1","p2","p3","p4","p5")),strand=as.integer(1),space=c(6,6,6,6,5))
features =  
RangedData(IRanges(start=c(1549800,1554400,1565000,1569400,167888600),end=c(1550599,1560799,1565399,1571199,167888999),names=c("f1","f2","f3","f4","f5")),strand=as.integer(1),space=c(6,6,6,6,5))

annoPeaks = annotatePeakInBatch(peaks,AnnotationData=features)

as.data.frame(annoPeaks)

  space     start       end width names strand feature start_position
1     5 167889600 167893599  4000    p5      1      f5      167888600
2     6   1543200   1555199 12000    p1      1      f1        1549800
3     6   1557200   1560599  3400    p2      1      f2        1554400
4     6   1563000   1565199  2200    p3      1      f3        1565000
5     6   1569800   1573799  4000    p4      1      f4        1569400
  end_position insideFeature distancetoFeature
1    167888999         FALSE              1000
2      1550599         FALSE             -6600
3      1560799          TRUE              2800
4      1565399         FALSE             -2000
5      1571199          TRUE               400


> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=C
 [5] LC_MONETARY=C              LC_MESSAGES=C
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] ChIPpeakAnno_1.3.0                  org.Hs.eg.db_2.3.6
 [3] GO.db_2.3.5                         RSQLite_0.7-3
 [5] DBI_0.2-4                           AnnotationDbi_1.8.0
 [7] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.14.0
 [9] Biostrings_2.14.2                   IRanges_1.5.18
[11] multtest_2.2.0                      Biobase_2.6.0
[13] biomaRt_2.3.0

loaded via a namespace (and not attached):
[1] MASS_7.3-3      RCurl_1.3-0     XML_2.6-0       splines_2.10.0
[5] survival_2.35-7


-----------------------------------------------------------
This e-mail was sent by GlaxoSmithKline Services Unlimited 
(registered in England and Wales No. 1047315), which is a 
member of the GlaxoSmithKline group of companies. The 
registered address of GlaxoSmithKline Services Unlimited 
is 980 Great West Road, Brentford, Middlesex TW8 9GS.
-----------------------------------------------------------

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to