Hi Khademul, Thank you so much for sharing your comparison between ChIPpeakAnno and CEAS, and the great suggestions! We will incorporate your ideas into the next release. The functionality of finding overlapping feature is exactly what we are planning to add.
Best regards, Julie On 3/9/10 1:40 PM, "Khademul Islam" <[email protected]> wrote: hi Julie, Thanks for your mail. I use CEAS but not CiSGenome. I liked ChIPpeakAnno because # its in R, and fast # We can download peak region sequence for motif finding, or flanking region for primer design # We can do GO enrichment # CEAS offer distance of peak from ALL RefSeq Known genes (out put in excel file), but we usually want to find ONLY those genes which is closest. and We can use Ensembl gene here............we can use Biomart here ............. # We can compare two bed files and if peaks from two TF (in two bed file) are overlapping or not; if not, how far are they or what is the closest TF......... even how far two closest TFs....... we can know (we dont get it in CEAS). ## CEAS generates a pie chart of graphical distribution of peaks in different region (intron, exon, promoter, distal intergenic, downstream, upstream.......etc)...... but does not say actual distance............ that we can get from ChIPpeakAnno......... not all but at least distance from TSS, TTS, so, how far in promoter easily. I have some suggestions: ------------------------------------- # For example, there is one big size gene and its contains a miRNA gene inside it. I have observed that my peak is close the start side of the miRNA gene but actually it resides inside the big gene. So, ChIPpeakAnno decide to report the miRNA gene. However, in one analysis, I wanted to know only peaks that are residing within a gene. In this case the feature "TRUE" did not work. So, I used another tool called "BedTools" .......... then I realize that although ChIPpeakAnno saying the peak is "FALSE", but it is actually residing within a gene which is not reported. So, in some cases it would be nice to see "overlapping" feature within one tool ChIPpeakAnno. ## in another case: my peak: 24,72,191 -- 24,172,382 position ChipPeakAnno Found closest gene (TSS-TTS): 24,134,709 -- 24,511,317 at -ve strand However, the peak is actually residing in one gene (TSS-TTS): +ve strand 23,961,810--23,996,240 So, in that case, ChIPpeakAnno has right calculation that for negative strand, the gene closest to the peak it detect is correct. But this peak is still far far far away from TSS of the -ve stranded gene. So, in this case it would be more interesting to know that peak is actually residing inside a gene (+ve strand gene) although this gene's TSS is more far away than the -ve strand gene. So, considering this, another column of feature, if it is overlapping with any other gene or not would be interesting to know. Because, although peak is far from TSS, but it can work on gene splicing events etc other function ......................... ### You may extent the feature of the tool by making venn diagram of overlapping peaks. Currently I use BedTools to find overlap between two bed files, but using another tools "Cistrome" to generate venn diagram as BedTools does not do that or give account of overlapping peak directly (although we can count easily.........) ### may be some graphical presentation can also increase strength of ChIPpeakAnno. ### I guess there would be updated tutorial with some example code when you update it or include some more features. However, I would like to to thank for writing such nice and helpful package........... it's so useful in my daily ChIPseq data analysis. Whish you all the best with publication and looking fwd to see published version. thank you, Khademul Visiting Scholar, University of illinois at Chicago, USA [ PhD Student, Barcelona Biomedical Research Park Barcelona, Spain ] ============================================================================ ============================================================ On Tue, Mar 9, 2010 at 9:02 AM, Zhu, Julie <[email protected]> wrote: Hi Amy, Thank you very much for the feedback! The inside feature is TRUE if the peakSummit (stored as Start in the RangedData) is inside the nearest annotated feature, FALSE otherwise. The feature you are proposing is useful for finding overlapping fragments between query peak ranges and target ranges. The timing of your suggestion cannot be better. We plan to add a function called FindOverlappingPeaks that will be added to the dev version, your ideas will be incorporated. Please let me know if you and others think that it is better to incorporate your ideas into annotatePeakInBatch. Thanks again for your help making this package more useful. I have a favor to ask you and those who have used or are aware of the ChIPpeakAnno package. We submitted a paper describing this package and just got the reviewers' comments back. One question comes up is that what this package offers that the existing annotation tools such as Cisgenome and CEAS do not. I thought it might be useful to get the feedback from the users of this package. If possible, could you please send me your thoughts on this, especially the reasons you chose using this package? Thanks a lot for your time and help! Best regards, Julie ******************************************* Lihua Julie Zhu, Ph.D Research Associate Professor Program Gene Function and Expression University of Massachusetts Medical School 364 Plantation Street, Room 613 Worcester, MA 01605 508-856-5256 http://www.umassmed.edu/pgfe/faculty/zhu.cfm On 3/9/10 6:23 AM, "Amy Molesworth" <[email protected] <http://[email protected]> > wrote: Firstly I'd like to thank the authors of the very useful package ChIPpeakAnno. I'd like to report a feature in ChIPpeakAnno annotatePeakInBatch function results that other users may or may not be aware of. I also propose improvements to compensate. The resulting insideFeature column reports TRUE if the query peak is either contained within an annotated feature, and also reports TRUE if it overlaps the end of an annotated feature. I think its worth noting that it reports FALSE if the peak overlaps the beginning of an annotated feature, and also reports FALSE if the peak overlaps in entirety an annotated feature(s). So, perhaps the insideFeature column (or additional new column called overlappingFeature) could report five options: ("false","inside","overlapStart","overlapEnd","super"). I haven't looked into the effects on how distanceToFeature should/could be called for each different scenario. Apologies if this has already been addressed, or if others do not consider this useful. Details with dummy example are described below. Many thanks, Amy. ##### In the dummy example below, p1 is bigger than f1 and consequently p1 overlaps it in entirety. It would be nice if ChIPpeakAnno could report this - although I accept it may overlap more than one feature, so would need to consider how to deal with that. And another example from below, p3 in fact overlaps with the start of f3, but is called as insideFeature=FALSE. It would be nice if ChIPpeakAnno could report it as OverlapStart. p4 is called as insideFeature = TRUE for overlapping with f4, but it would be nice if ChIPpeakAnno could report it as OverlapEnd or something similar. And correctly p2 is called as insideFeature = TRUE for overlap with f2, in this case p2 ranges are within the f2 ranges as you would expect. library(ChIPpeakAnno) peaks = RangedData(IRanges(start=c(1543200,1557200,1563000,1569800,167889600),end=c(1555199,1560599,1565199,1573799,167893599),names=c("p1","p2","p3","p4","p5")),strand=as.integer(1),space=c(6,6,6,6,5)) features = RangedData(IRanges(start=c(1549800,1554400,1565000,1569400,167888600),end=c(1550599,1560799,1565399,1571199,167888999),names=c("f1","f2","f3","f4","f5")),strand=as.integer(1),space=c(6,6,6,6,5)) annoPeaks = annotatePeakInBatch(peaks,AnnotationData=features) as.data.frame(annoPeaks) space start end width names strand feature start_position 1 5 167889600 167893599 4000 p5 1 f5 167888600 2 6 1543200 1555199 12000 p1 1 f1 1549800 3 6 1557200 1560599 3400 p2 1 f2 1554400 4 6 1563000 1565199 2200 p3 1 f3 1565000 5 6 1569800 1573799 4000 p4 1 f4 1569400 end_position insideFeature distancetoFeature 1 167888999 FALSE 1000 2 1550599 FALSE -6600 3 1560799 TRUE 2800 4 1565399 FALSE -2000 5 1571199 TRUE 400 > sessionInfo() R version 2.10.0 (2009-10-26) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=C [5] LC_MONETARY=C LC_MESSAGES=C [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPpeakAnno_1.3.0 org.Hs.eg.db_2.3.6 [3] GO.db_2.3.5 RSQLite_0.7-3 [5] DBI_0.2-4 AnnotationDbi_1.8.0 [7] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.14.0 [9] Biostrings_2.14.2 IRanges_1.5.18 [11] multtest_2.2.0 Biobase_2.6.0 [13] biomaRt_2.3.0 loaded via a namespace (and not attached): [1] MASS_7.3-3 RCurl_1.3-0 XML_2.6-0 splines_2.10.0 [5] survival_2.35-7 ----------------------------------------------------------- This e-mail was sent by GlaxoSmithKline Services Unlimited (registered in England and Wales No. 1047315), which is a member of the GlaxoSmithKline group of companies. The registered address of GlaxoSmithKline Services Unlimited is 980 Great West Road, Brentford, Middlesex TW8 9GS. ----------------------------------------------------------- [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] <http://[email protected]> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
