Hi,

I obtained the exon sequences and here are the duplicate exon IDs with 
different descriptions.

TSS[duplicated(TSS[,1]), 1]
 [1] "AT1G68552.1-E12203"  "AT1G64140.1-E14755"  "AT1G64140.1-E14756"  
"AT1G70780.1-E4116"
 [5] "AT1G75390.1-E22428"  "AT1G06149.1-E1988"   "AT1G36730.1-E35050"  
"AT1G36730.1-E35051"
 [9] "AT1G29952.1-E5728"   "AT1G29952.1-E5730"   "AT1G29952.1-E5732"   
"AT1G29970.2-E8863"
[13] "AT1G29970.2-E8864"   "AT1G64628.1-E10574"  "AT1G25470.1-E20679"  
"AT1G58120.1-E18468"
[17] "AT1G29041.1-E15117"  "AT1G23149.1-E13728"  "AT1G29952.1-E5728"   
"AT1G29952.1-E5732"
[21] "AT2G18162.1-E49029"  "AT3G51632.1-E98183"  "AT3G22970.1-E89708"  
"AT3G45240.2-E86808"
[25] "AT3G18000.1-E98438"  "AT3G59052.1-E77046"  "AT3G62422.1-E76351"  
"AT3G25570.1-E88575"
[29] "AT3G25570.1-E88576"  "AT3G10910.1-E77164"  "AT3G02468.1-E88931"  
"AT3G12010.1-E78704"
[33] "AT3G01470.1-E92685"  "AT3G53402.1-E93478"  "AT3G26430.1-E85151"  
"AT3G26430.1-E85154"
[37] "AT4G19110.1-E121565" "AT4G22592.1-E113550" "AT4G22592.1-E113551" 
"AT4G22592.1-E113552"
[41] "AT4G12430.1-E113931" "AT4G12430.1-E113932" "AT4G12430.1-E113933" 
"AT4G25670.1-E111076"
[45] "AT4G25670.1-E111077" "AT4G36990.1-E122859" "AT4G14620.1-E120308" 
"AT4G34590.1-E116802"
[49] "AT5G09460.1-E136355" "AT5G09460.1-E136357" "AT5G50010.1-E151574" 
"AT5G50010.1-E151576"
[53] "AT5G50010.1-E151574" "AT5G50011.1-E153108" "AT5G50011.1-E153110" 
"AT5G09460.1-E136355"
[57] "AT5G09463.1-E151757" "AT5G09463.1-E151758" "AT5G52552.1-E136887" 
"AT5G52552.1-E136888"
[61] "AT5G41992.1-E154552" "AT5G64341.1-E144370" "AT5G64341.1-E144371" 
"AT5G64341.1-E144373"
[65] "AT5G64341.1-E144370" "AT5G64341.1-E144371" "AT5G64343.1-E148873" 
"AT5G64341.1-E144373"
[69] "AT5G09460.1-E136355" "AT5G09463.1-E151757" "AT5G09460.1-E136357" 
"AT5G09463.1-E151758"
[73] "AT5G49448.1-E171824" "AT5G05282.1-E152619" "AT5G53588.1-E159453" 
"AT5G09670.2-E157563"
[77] "AT5G01710.1-E140929" "AT5G64341.1-E144370" "AT5G64343.1-E148873" 
"AT5G61230.1-E153842"
[81] "AT5G61230.1-E153843" "AT5G60550.1-E140873" "AT5G64552.1-E148753" 
"AT5G64552.1-E148754"
[85] "AT5G45430.1-E151338"

For example,

TSS[TSS[,1]=="AT1G68552.1-E12203",]
         ensembl_exon_id chromosome_name exon_chrom_start exon_chrom_end strand
3125  AT1G68552.1-E12203               1         25727627       25727701     -1
15537 AT1G68552.1-E12203               1         25727627       25727701     -1
                                                                                
                                                                                
                                                                                
                                                                                
                            description
3125  CPuORF53 (Conserved peptide upstream open reading frame 53); Upstream 
open reading frames (uORFs) are small open reading frames found in the 5' UTR 
of a mature mRNA, and can potentially mediate translational regulation of the 
largest, or major, ORF (mORF). CPuORF53 represents a conserved upstream opening 
reading frame relative to major ORF AT1G68550.1
15537                                                                           
                      AP2 domain-containing transcription factor, putative; 
encodes a member of the ERF (ethylene response factor) subfamily B-6 of ERF/AP2 
transcription factor family. The protein contains one AP2 domain. There are 12 
members in this subfamily including RAP2.11.

So I think the database contains errors. In this case, it will require manual 
curation to determine which row to choose. Did you contact ensembl about this? 
Thanks!

Best regards,

Julie


*******************************************
Lihua Julie Zhu, Ph.D
Research Associate Professor
Program Gene Function and Expression
University of Massachusetts Medical School
364 Plantation Street, Room 613
Worcester, MA 01605
508-856-5256
http://www.umassmed.edu/pgfe/faculty/zhu.cfm
*******************************************

On 3/5/10 6:46 PM, "[email protected]" <[email protected]> wrote:



 Dear bioc-sig-sequencing,

I would like to annotate chip-seq peaks for the arabidopsis genome.  "TSS" and 
"Exon" are two of the arguments for the 'getAnnotation' function.  The "TSS" 
argument succeeded, but the "Exon" argument failed.

...
> arabdset<-useMart(biomart="plant_mart_4", dataset = "athaliana_eg_gene")
Checking attributes ... ok
Checking filters ... ok
> ExonArabAnno<-getAnnotation(arabdset, featureType="Exon")
Error in `rownames<-`(`*tmp*`, value = c("ATCG00010.1-E176369", 
"ATMG00010.1-E176520",  :
  duplicate rownames not allowed

> sessionInfo()
R version 2.11.0 Under development (unstable) (2010-02-28 r51186)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] ChIPpeakAnno_1.3.4                  org.Hs.eg.db_2.3.6
 [3] GO.db_2.3.5                         RSQLite_0.8-3
 [5] DBI_0.2-5                           AnnotationDbi_1.9.4
 [7] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.15.11
 [9] Biostrings_2.15.22                  IRanges_1.5.51
[11] multtest_2.3.0                      Biobase_2.7.4
[13] biomaRt_2.3.4

loaded via a namespace (and not attached):
[1] MASS_7.3-5      RCurl_1.3-1     splines_2.11.0  survival_2.35-8
[5] tools_2.11.0    XML_2.6-0
>

Can someone comment?


Thanks,
P. Terry
[email protected]

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing




        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to