Thanks Deepak. 

I've had a look at the code and I believe its due to the different ways in 
which BioJava and BioPerl load the taxon table. 

BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the values 
from the NCBI taxonomy file. The taxon_id column in BioJava is a meaningless 
auto-generated value that is never used.

BioPerl however is generating taxon_id values and linking them by setting 
parent_taxon_id to the generated value. The parent value from the NCBI taxonomy 
file is therefore replaced with the BioPerl generated parent ID, meaning that 
instead of linking from parent_taxon_id to ncbi_taxon_id as per BioJava, the 
link is to taxon_id instead. (I'm basing this comment on looking at 
load_ncbi_taxonomy.pl from the BioSQL archives.)

I believe if you load the taxonomy table using BioJava, you should see BioJava 
giving correct behaviour. Likewise if you load it using BioPerl, BioPerl will 
behave correctly. But if you load with one then query with the other, you'll 
get incorrect results.

This sounds like a case for discussion on both lists - a matter of 
standardisation between the two projects. Not quickly/easily solvable for now.

cheers,
Richard

On 11 Apr 2010, at 22:08, Deepak Sheoran wrote:

> I am using same table with biojava and bioperl taxon program and the output I 
> get is below:
> 
> Biojava:
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i 
> get is 
>             Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia 
> australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. 
> haydenii. 
> 
> Biojava process of finding names: 
> 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240   
> (wrong way of doing things)
> 
> Bioperl:    
> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i 
> get is 
>           Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified  
> Alpharetrovirus.
> 
> Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632   
> (Right way of doing things)
> 
> Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id 
> where bioperl search taxon_id column with a value from parent_taxon_id.
> 
> Taxon and Taxon_name Table content which is being relevant  in discussion:
> 
> taxon_id      ncbi_taxon_id   parent_taxon_id node_rank       name    
> name_class
> 2901  3609    276240  genus   Rhamnus scientific name
> 3610  4403    3609    species Platanus occidentalis   scientific name
> 29052 48579   4403    species Suillus placidus        scientific name
> 114412        143975  48579   species Diadasia australis      scientific name
> 143976        176516  143975  species Arnicastrum guerrerense scientific name
> 30680 50447   176516  family  Labiduridae     scientific name
> 254757        301952  50447   varietas        Oreostemma alpigenum var. 
> haydenii      scientific name
> 9394  11632   17394   family  Retroviridae    scientific name
> 277861        327045  9394    subfamily       Orthoretrovirinae       
> scientific name
> 122448        153057  277861  genus   Alpharetrovirus scientific name
> 301952        353825  122448  no rank unclassified Alpharetrovirus    
> scientific name
> 9584
> 11876
> 301952
> species
> Avian sarcoma virus
> scientifice name
> 
> Thanks
> Deepak 
> 
> On 4/11/2010 2:53 PM, Richard Holland wrote:
>> I'm sorry but I don't understand your example. Could you provide a real 
>> example of correct values for each column from a sample taxon entry in NCBI, 
>> plus an example of what BioJava is doing wrong? (i.e. give a sample record 
>> to use as reference, then point out the correct value of parent_taxon_id, 
>> and point out what value BioJava is using instead).
>> 
>> thanks,
>> Richard
>> 
>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
>> 
>>   
>> 
>>> Hi,
>>> 
>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which 
>>> it is producing wrong taxonomy hierarchy. I am explaing what I have found 
>>> let me what you guys think of it, and me suggest how to fix it.
>>> 
>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, 
>>> nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have 
>>> parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The 
>>> value which "parent_taxon_id" have is "taxon_id" which have 
>>> parent_ncbi_taxon_id of current ncbi_taxon_id.
>>> 
>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
>>> <property name="nodeRank" column="node_rank"/>
>>> <property name="geneticCode" column="genetic_code"/>
>>> <property name="mitoGeneticCode" column="mito_genetic_code"/>
>>> <property name="leftValue" column="left_value"/>
>>> <property name="rightValue" column="right_value"/>
>>> <property name="parentNCBITaxID" column="parent_taxon_id"/>      ----- its 
>>> not correct column parent_taxon_id stores the taxon_id which have 
>>> parent_ncbi_taxon_id for current entry
>>> 
>>> Thanks
>>> Deepak Sheoran
>>> 
>>> 
>>>     
>>> 
>> --
>> Richard Holland, BSc MBCS
>> Operations and Delivery Director, Eagle Genomics Ltd
>> T: +44 (0)1223 654481 ext 3 | E: 
>> [email protected]
>> http://www.eaglegenomics.com/
>> 
>> 
>>   
>> 
> 

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: [email protected]
http://www.eaglegenomics.com/


_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to