I am using same table with biojava and bioperl taxon program and the output I get is below:

*Biojava:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii.

Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 (wrong way of doing things)

*Bioperl:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus.

Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632 (Right way of doing things)

Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id.

*Taxon and Taxon_name Table content which is being relevant  in discussion:*

taxon_id        ncbi_taxon_id   parent_taxon_id         node_rank       name    
name_class
2901    3609    276240  genus   Rhamnus         scientific name
3610    4403    3609    species         Platanus occidentalis   scientific name
29052   48579   4403    species         Suillus placidus        scientific name
114412  143975  48579   species         Diadasia australis      scientific name
143976  176516  143975  species         Arnicastrum guerrerense         
scientific name
30680   50447   176516  family  Labiduridae     scientific name
254757 301952 50447 varietas Oreostemma alpigenum var. haydenii scientific name
9394    11632   17394   family  Retroviridae    scientific name
277861  327045  9394    subfamily       Orthoretrovirinae       scientific name
122448  153057  277861  genus   Alpharetrovirus         scientific name
301952 353825 122448 no rank unclassified Alpharetrovirus scientific name
9584
        11876
        301952
        species
        Avian sarcoma virus
        scientifice name


Thanks
Deepak

On 4/11/2010 2:53 PM, Richard Holland wrote:
I'm sorry but I don't understand your example. Could you provide a real example 
of correct values for each column from a sample taxon entry in NCBI, plus an 
example of what BioJava is doing wrong? (i.e. give a sample record to use as 
reference, then point out the correct value of parent_taxon_id, and point out 
what value BioJava is using instead).

thanks,
Richard

On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:

Hi,

Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is 
producing wrong taxonomy hierarchy. I am explaing what I have found let me what 
you guys think of it, and me suggest how to fix it.

1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, 
nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for 
current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is 
"taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.

<property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
<property name="nodeRank" column="node_rank"/>
<property name="geneticCode" column="genetic_code"/>
<property name="mitoGeneticCode" column="mito_genetic_code"/>
<property name="leftValue" column="left_value"/>
<property name="rightValue" column="right_value"/>
<property name="parentNCBITaxID" column="parent_taxon_id"/>       ----- its not 
correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry

Thanks
Deepak Sheoran


--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: [email protected]
http://www.eaglegenomics.com/


_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to