Hi Wolfgang,

Thank you very much for looking into this! For one exon ID , there are more 
than one row with different descriptions. I thought this is an entry error. Now 
I understand that it is actually the result of lack of full understanding of 
the exon rather than database issue after you pointed out that multiple rows 
for such an exon are actually intentional.  I am curious about the reason to 
have separate rows for such a exon instead of having one row with all possible 
descriptions. Perhaps because there are different evidences associated with 
each row. Is there a flag for each such row that indicates the rank or 
plausibility? If there is, I will incorporate the flag into the getAnnotation 
function in ChIPpeakAnno.  I deeply appreciate your thoughts on this.

Thanks again for helping me to have a deeper understanding of the system!

Best regards,

Julie


*******************************************
Lihua Julie Zhu, Ph.D
Research Associate Professor
Program Gene Function and Expression
University of Massachusetts Medical School
364 Plantation Street, Room 613
Worcester, MA 01605
508-856-5256
http://www.umassmed.edu/pgfe/faculty/zhu.cfm
*******************************************



On 3/17/10 9:48 AM, "Wolfgang Huber" <[email protected]> wrote:

Julie

why do you say that "the database contains errors" ? I had a look at
http://gbrowse.arabidopsis.org/cgi-bin/gbrowse/arabidopsis/?name=AT1G68552.1
and while this is perhaps a complex locus whose expression we have not
yet fully understood, or not yet properly formalised into the database's
ontology of genomic features and gene products, I am not sure "error" is
the right term for that.

Arabidopsis people might have more insight on that.

        Wolfgang




Zhu, Julie scripsit 16/03/10 22:56:
> Hi,
>
> I obtained the exon sequences and here are the duplicate exon IDs with 
> different descriptions.
>
> TSS[duplicated(TSS[,1]), 1]
>  [1] "AT1G68552.1-E12203"  "AT1G64140.1-E14755"  "AT1G64140.1-E14756"  
> "AT1G70780.1-E4116"
>  [5] "AT1G75390.1-E22428"  "AT1G06149.1-E1988"   "AT1G36730.1-E35050"  
> "AT1G36730.1-E35051"
>  [9] "AT1G29952.1-E5728"   "AT1G29952.1-E5730"   "AT1G29952.1-E5732"   
> "AT1G29970.2-E8863"
> [13] "AT1G29970.2-E8864"   "AT1G64628.1-E10574"  "AT1G25470.1-E20679"  
> "AT1G58120.1-E18468"
> [17] "AT1G29041.1-E15117"  "AT1G23149.1-E13728"  "AT1G29952.1-E5728"   
> "AT1G29952.1-E5732"
> [21] "AT2G18162.1-E49029"  "AT3G51632.1-E98183"  "AT3G22970.1-E89708"  
> "AT3G45240.2-E86808"
> [25] "AT3G18000.1-E98438"  "AT3G59052.1-E77046"  "AT3G62422.1-E76351"  
> "AT3G25570.1-E88575"
> [29] "AT3G25570.1-E88576"  "AT3G10910.1-E77164"  "AT3G02468.1-E88931"  
> "AT3G12010.1-E78704"
> [33] "AT3G01470.1-E92685"  "AT3G53402.1-E93478"  "AT3G26430.1-E85151"  
> "AT3G26430.1-E85154"
> [37] "AT4G19110.1-E121565" "AT4G22592.1-E113550" "AT4G22592.1-E113551" 
> "AT4G22592.1-E113552"
> [41] "AT4G12430.1-E113931" "AT4G12430.1-E113932" "AT4G12430.1-E113933" 
> "AT4G25670.1-E111076"
> [45] "AT4G25670.1-E111077" "AT4G36990.1-E122859" "AT4G14620.1-E120308" 
> "AT4G34590.1-E116802"
> [49] "AT5G09460.1-E136355" "AT5G09460.1-E136357" "AT5G50010.1-E151574" 
> "AT5G50010.1-E151576"
> [53] "AT5G50010.1-E151574" "AT5G50011.1-E153108" "AT5G50011.1-E153110" 
> "AT5G09460.1-E136355"
> [57] "AT5G09463.1-E151757" "AT5G09463.1-E151758" "AT5G52552.1-E136887" 
> "AT5G52552.1-E136888"
> [61] "AT5G41992.1-E154552" "AT5G64341.1-E144370" "AT5G64341.1-E144371" 
> "AT5G64341.1-E144373"
> [65] "AT5G64341.1-E144370" "AT5G64341.1-E144371" "AT5G64343.1-E148873" 
> "AT5G64341.1-E144373"
> [69] "AT5G09460.1-E136355" "AT5G09463.1-E151757" "AT5G09460.1-E136357" 
> "AT5G09463.1-E151758"
> [73] "AT5G49448.1-E171824" "AT5G05282.1-E152619" "AT5G53588.1-E159453" 
> "AT5G09670.2-E157563"
> [77] "AT5G01710.1-E140929" "AT5G64341.1-E144370" "AT5G64343.1-E148873" 
> "AT5G61230.1-E153842"
> [81] "AT5G61230.1-E153843" "AT5G60550.1-E140873" "AT5G64552.1-E148753" 
> "AT5G64552.1-E148754"
> [85] "AT5G45430.1-E151338"
>
> For example,
>
> TSS[TSS[,1]=="AT1G68552.1-E12203",]
>          ensembl_exon_id chromosome_name exon_chrom_start exon_chrom_end 
> strand
> 3125  AT1G68552.1-E12203               1         25727627       25727701     
> -1
> 15537 AT1G68552.1-E12203               1         25727627       25727701     
> -1
>                                                                               
>                                                                               
>                                                                               
>                                                                               
>                                     description
> 3125  CPuORF53 (Conserved peptide upstream open reading frame 53); Upstream 
> open reading frames (uORFs) are small open reading frames found in the 5' UTR 
> of a mature mRNA, and can potentially mediate translational regulation of the 
> largest, or major, ORF (mORF). CPuORF53 represents a conserved upstream 
> opening reading frame relative to major ORF AT1G68550.1
> 15537                                                                         
>                         AP2 domain-containing transcription factor, putative; 
> encodes a member of the ERF (ethylene response factor) subfamily B-6 of 
> ERF/AP2 transcription factor family. The protein contains one AP2 domain. 
> There are 12 members in this subfamily including RAP2.11.
>
> So I think the database contains errors. In this case, it will require manual 
> curation to determine which row to choose. Did you contact ensembl about 
> this? Thanks!
>
> Best regards,
>
> Julie
>
>
> *******************************************
> Lihua Julie Zhu, Ph.D
> Research Associate Professor
> Program Gene Function and Expression
> University of Massachusetts Medical School
> 364 Plantation Street, Room 613
> Worcester, MA 01605
> 508-856-5256
> http://www.umassmed.edu/pgfe/faculty/zhu.cfm
> *******************************************
>
> On 3/5/10 6:46 PM, "[email protected]" <[email protected]> wrote:
>
>
>
>  Dear bioc-sig-sequencing,
>
> I would like to annotate chip-seq peaks for the arabidopsis genome.  "TSS" 
> and "Exon" are two of the arguments for the 'getAnnotation' function.  The 
> "TSS" argument succeeded, but the "Exon" argument failed.
>
> ...
>> arabdset<-useMart(biomart="plant_mart_4", dataset = "athaliana_eg_gene")
> Checking attributes ... ok
> Checking filters ... ok
>> ExonArabAnno<-getAnnotation(arabdset, featureType="Exon")
> Error in `rownames<-`(`*tmp*`, value = c("ATCG00010.1-E176369", 
> "ATMG00010.1-E176520",  :
>   duplicate rownames not allowed
>
>> sessionInfo()
> R version 2.11.0 Under development (unstable) (2010-02-28 r51186)
> x86_64-unknown-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] ChIPpeakAnno_1.3.4                  org.Hs.eg.db_2.3.6
>  [3] GO.db_2.3.5                         RSQLite_0.8-3
>  [5] DBI_0.2-5                           AnnotationDbi_1.9.4
>  [7] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.15.11
>  [9] Biostrings_2.15.22                  IRanges_1.5.51
> [11] multtest_2.3.0                      Biobase_2.7.4
> [13] biomaRt_2.3.4
>
> loaded via a namespace (and not attached):
> [1] MASS_7.3-5      RCurl_1.3-1     splines_2.11.0  survival_2.35-8
> [5] tools_2.11.0    XML_2.6-0
>
> Can someone comment?
>
>
> Thanks,
> P. Terry
> [email protected]
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
>
>
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--

Best wishes
      Wolfgang


--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber/contact






        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to