Re: [Bioc-sig-seq] ChIPpeakAnno annotatePeakInBatch function

Zhu, Julie Tue, 09 Mar 2010 10:53:18 -0800

Hi Khademul,

Thank you so much for sharing your comparison between ChIPpeakAnno and CEAS, 
and the great suggestions! We will incorporate your ideas into the next 
release. The functionality of finding overlapping feature is exactly what we 
are planning to add.


Best regards,

Julie


On 3/9/10 1:40 PM, "Khademul Islam" <[email protected]> wrote:

hi Julie,

Thanks for your mail.  I use CEAS but not CiSGenome. I liked ChIPpeakAnno 
because

# its in R, and fast
# We can download peak region sequence for motif finding, or flanking region 
for primer design
# We can do GO enrichment
# CEAS offer distance of peak from ALL RefSeq Known genes (out put in excel 
file), but we usually want to find ONLY those genes which is closest. and We 
can use Ensembl gene here............we can use Biomart here .............
# We can compare two bed files and if peaks from two TF (in two bed file) are 
overlapping or not; if not, how far are they or what is the closest TF......... 
even how far two closest TFs....... we can know (we dont get it in CEAS).
## CEAS generates a pie chart of graphical distribution of peaks in different 
region (intron, exon, promoter, distal intergenic, downstream, 
upstream.......etc)...... but does not say actual distance............ that we 
can get from ChIPpeakAnno......... not all but at least distance from TSS, TTS, 
so, how far in promoter easily.

I have some suggestions:
-------------------------------------

# For example, there is one big size gene and its contains a miRNA gene inside 
it. I have observed that my peak is close the start side of the miRNA gene but 
actually it resides inside the big gene. So, ChIPpeakAnno decide to report the 
miRNA gene. However, in one analysis, I wanted to know only peaks that are 
residing within a gene. In this case the feature "TRUE" did not work. So, I 
used another tool called "BedTools" .......... then I realize that although 
ChIPpeakAnno saying the peak is "FALSE", but it is actually residing within a 
gene which is not reported. So, in some cases it would be nice to see 
"overlapping" feature within one tool ChIPpeakAnno.

## in another case:

my peak: 24,72,191 -- 24,172,382 position

ChipPeakAnno Found closest gene (TSS-TTS): 24,134,709 -- 24,511,317 at -ve 
strand

However, the peak is actually residing in one gene (TSS-TTS): +ve strand 
23,961,810--23,996,240

So, in that case, ChIPpeakAnno has right calculation that for negative strand, 
the gene closest to the peak it detect is correct. But this peak is still far 
far far away from TSS of the -ve stranded gene. So, in this case it would be 
more interesting to know that peak is actually residing inside a gene (+ve 
strand gene) although this gene's TSS is more far away than the -ve strand gene.

So, considering this, another column of feature, if it is overlapping with any 
other gene or not would be interesting to know. Because, although peak is far 
from TSS, but it can work on gene splicing events etc other function 
.........................

###

You may extent the feature of the tool by making venn diagram of overlapping 
peaks. Currently I use BedTools to find overlap between two bed files, but 
using another tools "Cistrome" to generate venn diagram as  BedTools does not 
do that or give account of overlapping peak directly (although we can count 
easily.........)


### may be some graphical presentation can also increase strength of 
ChIPpeakAnno.

### I guess there would be updated tutorial with some example code when you 
update it or include some more features.

However, I would like to to thank for writing such nice and helpful 
package........... it's so useful in my daily ChIPseq data analysis.

Whish you all the best with publication and looking fwd to see published 
version.

thank you,

Khademul

Visiting Scholar,
University of illinois at Chicago, USA

[ PhD Student,
Barcelona Biomedical Research Park
Barcelona, Spain ]



============================================================================





============================================================

On Tue, Mar 9, 2010 at 9:02 AM, Zhu, Julie <[email protected]> wrote:
Hi Amy,

Thank you very much for the feedback!

The inside feature is TRUE if the peakSummit (stored as Start in the 
RangedData) is inside the nearest annotated feature, FALSE otherwise.  The 
feature you are proposing is useful for finding overlapping fragments between 
query peak ranges and target ranges. The timing of your suggestion cannot be 
better. We plan to add  a function called FindOverlappingPeaks that will be 
added to the dev version, your ideas will be incorporated. Please let me know 
if you and others think that it is better to incorporate your ideas into 
annotatePeakInBatch. Thanks again for your help making this package more useful.

I have a favor to ask you and those who have used or are aware of the 
ChIPpeakAnno package. We submitted a paper describing this package and just got 
the reviewers' comments back. One question comes up is that what this package 
offers that the existing annotation tools such as Cisgenome and CEAS do not. I 
thought it might be useful to get the feedback from the users of this package. 
If possible, could you please send me your thoughts on this, especially the 
reasons you chose using this package? Thanks a lot for your time and help!

Best regards,

Julie


*******************************************
Lihua Julie Zhu, Ph.D
Research Associate Professor
Program Gene Function and Expression
University of Massachusetts Medical School
364 Plantation Street, Room 613
Worcester, MA 01605
508-856-5256
http://www.umassmed.edu/pgfe/faculty/zhu.cfm



On 3/9/10 6:23 AM, "Amy Molesworth" <[email protected] 
<http://[email protected]> > wrote:



Firstly I'd like to thank the authors of the very useful package ChIPpeakAnno. 
I'd like to report a feature in ChIPpeakAnno annotatePeakInBatch function 
results that other users may or may not be aware of. I also propose 
improvements to compensate.

The resulting insideFeature column reports TRUE if the query peak is either 
contained within an annotated feature, and also reports TRUE if it overlaps the 
end of an annotated feature.

I think its worth noting that it reports FALSE if the peak overlaps the 
beginning of an annotated feature, and also reports FALSE if the peak overlaps 
in entirety an annotated feature(s).

So, perhaps the insideFeature column (or additional new column called 
overlappingFeature) could report five options: 
("false","inside","overlapStart","overlapEnd","super"). I haven't looked into 
the effects on how distanceToFeature should/could be called for each different 
scenario.

Apologies if this has already been addressed, or if others do not consider this 
useful.

Details with dummy example are described below.

Many thanks,
Amy.

#####

In the dummy example below, p1 is bigger than f1 and consequently p1 overlaps 
it in entirety. It would be nice if ChIPpeakAnno could report this - although I 
accept it may overlap more than one feature,
so would need to consider how to deal with that.

And another example from below, p3 in fact overlaps with the start of f3, but 
is called as insideFeature=FALSE. It would be nice if ChIPpeakAnno could report 
it as OverlapStart.

p4 is called as insideFeature = TRUE for overlapping with f4, but it would be 
nice if ChIPpeakAnno could report it as OverlapEnd or something similar.

And correctly p2 is called as insideFeature = TRUE for overlap with f2, in this 
case p2 ranges are within the f2 ranges as you would expect.


library(ChIPpeakAnno)
peaks = 
RangedData(IRanges(start=c(1543200,1557200,1563000,1569800,167889600),end=c(1555199,1560599,1565199,1573799,167893599),names=c("p1","p2","p3","p4","p5")),strand=as.integer(1),space=c(6,6,6,6,5))
features =  
RangedData(IRanges(start=c(1549800,1554400,1565000,1569400,167888600),end=c(1550599,1560799,1565399,1571199,167888999),names=c("f1","f2","f3","f4","f5")),strand=as.integer(1),space=c(6,6,6,6,5))

annoPeaks = annotatePeakInBatch(peaks,AnnotationData=features)

as.data.frame(annoPeaks)

  space     start       end width names strand feature start_position
1     5 167889600 167893599  4000    p5      1      f5      167888600
2     6   1543200   1555199 12000    p1      1      f1        1549800
3     6   1557200   1560599  3400    p2      1      f2        1554400
4     6   1563000   1565199  2200    p3      1      f3        1565000
5     6   1569800   1573799  4000    p4      1      f4        1569400
  end_position insideFeature distancetoFeature
1    167888999         FALSE              1000
2      1550599         FALSE             -6600
3      1560799          TRUE              2800
4      1565399         FALSE             -2000
5      1571199          TRUE               400


> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=C
 [5] LC_MONETARY=C              LC_MESSAGES=C
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] ChIPpeakAnno_1.3.0                  org.Hs.eg.db_2.3.6
 [3] GO.db_2.3.5                         RSQLite_0.7-3
 [5] DBI_0.2-4                           AnnotationDbi_1.8.0
 [7] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.14.0
 [9] Biostrings_2.14.2                   IRanges_1.5.18
[11] multtest_2.2.0                      Biobase_2.6.0
[13] biomaRt_2.3.0

loaded via a namespace (and not attached):
[1] MASS_7.3-3      RCurl_1.3-0     XML_2.6-0       splines_2.10.0
[5] survival_2.35-7


-----------------------------------------------------------
This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.
-----------------------------------------------------------

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected] <http://[email protected]>
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing





        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] ChIPpeakAnno annotatePeakInBatch function

Reply via email to