Incidentally, BioJava's approach matches the description in the BioSQL docs at:
http://biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME (first example SQL statement - find the taxon id of the parent taxon for 'Homo sapiens' using a self-join) The BioPerl/BioSQL load_ncbi_taxonomy.pl script however does not match this description. cheers, Richard On 12 Apr 2010, at 07:57, Richard Holland wrote: > Thanks Deepak. > > I've had a look at the code and I believe its due to the different ways in > which BioJava and BioPerl load the taxon table. > > BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the > values from the NCBI taxonomy file. The taxon_id column in BioJava is a > meaningless auto-generated value that is never used. > > BioPerl however is generating taxon_id values and linking them by setting > parent_taxon_id to the generated value. The parent value from the NCBI > taxonomy file is therefore replaced with the BioPerl generated parent ID, > meaning that instead of linking from parent_taxon_id to ncbi_taxon_id as per > BioJava, the link is to taxon_id instead. (I'm basing this comment on looking > at load_ncbi_taxonomy.pl from the BioSQL archives.) > > I believe if you load the taxonomy table using BioJava, you should see > BioJava giving correct behaviour. Likewise if you load it using BioPerl, > BioPerl will behave correctly. But if you load with one then query with the > other, you'll get incorrect results. > > This sounds like a case for discussion on both lists - a matter of > standardisation between the two projects. Not quickly/easily solvable for now. > > cheers, > Richard > > On 11 Apr 2010, at 22:08, Deepak Sheoran wrote: > >> I am using same table with biojava and bioperl taxon program and the output >> I get is below: >> >> Biojava: >> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i >> get is >> Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia >> australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. >> haydenii. >> >> Biojava process of finding names: >> 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 >> (wrong way of doing things) >> >> Bioperl: >> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i >> get is >> Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified >> Alpharetrovirus. >> >> Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632 >> (Right way of doing things) >> >> Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id >> where bioperl search taxon_id column with a value from parent_taxon_id. >> >> Taxon and Taxon_name Table content which is being relevant in discussion: >> >> taxon_id ncbi_taxon_id parent_taxon_id node_rank name >> name_class >> 2901 3609 276240 genus Rhamnus scientific name >> 3610 4403 3609 species Platanus occidentalis scientific name >> 29052 48579 4403 species Suillus placidus scientific name >> 114412 143975 48579 species Diadasia australis scientific name >> 143976 176516 143975 species Arnicastrum guerrerense scientific name >> 30680 50447 176516 family Labiduridae scientific name >> 254757 301952 50447 varietas Oreostemma alpigenum var. >> haydenii scientific name >> 9394 11632 17394 family Retroviridae scientific name >> 277861 327045 9394 subfamily Orthoretrovirinae >> scientific name >> 122448 153057 277861 genus Alpharetrovirus scientific name >> 301952 353825 122448 no rank unclassified Alpharetrovirus >> scientific name >> 9584 >> 11876 >> 301952 >> species >> Avian sarcoma virus >> scientifice name >> >> Thanks >> Deepak >> >> On 4/11/2010 2:53 PM, Richard Holland wrote: >>> I'm sorry but I don't understand your example. Could you provide a real >>> example of correct values for each column from a sample taxon entry in >>> NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample >>> record to use as reference, then point out the correct value of >>> parent_taxon_id, and point out what value BioJava is using instead). >>> >>> thanks, >>> Richard >>> >>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: >>> >>> >>> >>>> Hi, >>>> >>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which >>>> it is producing wrong taxonomy hierarchy. I am explaing what I have found >>>> let me what you guys think of it, and me suggest how to fix it. >>>> >>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, >>>> nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) >>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have >>>> parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. >>>> The value which "parent_taxon_id" have is "taxon_id" which have >>>> parent_ncbi_taxon_id of current ncbi_taxon_id. >>>> >>>> <property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/> >>>> <property name="nodeRank" column="node_rank"/> >>>> <property name="geneticCode" column="genetic_code"/> >>>> <property name="mitoGeneticCode" column="mito_genetic_code"/> >>>> <property name="leftValue" column="left_value"/> >>>> <property name="rightValue" column="right_value"/> >>>> <property name="parentNCBITaxID" column="parent_taxon_id"/> ----- its >>>> not correct column parent_taxon_id stores the taxon_id which have >>>> parent_ncbi_taxon_id for current entry >>>> >>>> Thanks >>>> Deepak Sheoran >>>> >>>> >>>> >>>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: >>> [email protected] >>> http://www.eaglegenomics.com/ >>> >>> >>> >>> >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: [email protected] > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - [email protected] > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: [email protected] http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
