If we don't want to change the current code in biojava and still want to
fix this bug I have found a way,
1) we can do this by changing one of hibernate files called
"Taxon.hbm.xml" and replace the line
<property name="parentNCBITaxID" column="parent_taxon_id"/>
with
<property name="parentNCBITaxID" formula="(select tax.ncbi_taxon_id from
taxon tax where tax.taxon_id = parent_taxon_id)"/>
by changing the above setting in hibernate setting I am able to get the
correct linage for ncbi_taxon_id = 11876(Avian sarcoma virus) which is
Viruses; Retro-transcribing viruses; Retroviridae;
Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus.
2) But the possible issue which we might get is with Taxonomy loader
class which want to insert something for parent taxon_id into taxon
table which I think won't be possible if we do this change to hibernate
con-fig file.
Deepak Sheoran
On 4/11/2010 4:08 PM, Deepak Sheoran wrote:
I am using same table with biojava and bioperl taxon program and the
output I get is below:
*Biojava:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the
lineage i get is
Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia
australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum
var. haydenii.
Biojava process of finding names:
11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240
(wrong way of doing things)
*Bioperl:*
For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the
lineage i get is
Retroviridae; Orthoretrovirinae; Alpharetrovirus;
unclassified Alpharetrovirus.
Bioperl process of finding names:
11876==>353825==>153057==>327045==>11632 (Right way of doing things)
Hint: biojava search ncbi_taxon_id column with a value from
parent_taxon_id where bioperl search taxon_id column with a value from
parent_taxon_id.
*Taxon and Taxon_name Table content which is being relevant in
discussion:*
taxon_id ncbi_taxon_id parent_taxon_id node_rank name
name_class
2901 3609 276240 genus Rhamnus scientific name
3610 4403 3609 species Platanus occidentalis scientific name
29052 48579 4403 species Suillus placidus scientific name
114412 143975 48579 species Diadasia australis scientific name
143976 176516 143975 species Arnicastrum guerrerense
scientific name
30680 50447 176516 family Labiduridae scientific name
254757 301952 50447 varietas Oreostemma alpigenum var. haydenii
scientific name
9394 11632 17394 family Retroviridae scientific name
277861 327045 9394 subfamily Orthoretrovirinae scientific name
122448 153057 277861 genus Alpharetrovirus scientific name
301952 353825 122448 no rank unclassified Alpharetrovirus
scientific name
9584
11876
301952
species
Avian sarcoma virus
scientifice name
Thanks
Deepak
On 4/11/2010 2:53 PM, Richard Holland wrote:
I'm sorry but I don't understand your example. Could you provide a real example
of correct values for each column from a sample taxon entry in NCBI, plus an
example of what BioJava is doing wrong? (i.e. give a sample record to use as
reference, then point out the correct value of parent_taxon_id, and point out
what value BioJava is using instead).
thanks,
Richard
On 11 Apr 2010, at 20:16, Deepak Sheoran wrote:
Hi,
Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is
producing wrong taxonomy hierarchy. I am explaing what I have found let me what
you guys think of it, and me suggest how to fix it.
1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id,
nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue)
2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for
current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is
"taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id.
<property name="NCBITaxID" column="ncbi_taxon_id" node="@NCBITaxId"/>
<property name="nodeRank" column="node_rank"/>
<property name="geneticCode" column="genetic_code"/>
<property name="mitoGeneticCode" column="mito_genetic_code"/>
<property name="leftValue" column="left_value"/>
<property name="rightValue" column="right_value"/>
<property name="parentNCBITaxID" column="parent_taxon_id"/> ----- its not
correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry
Thanks
Deepak Sheoran
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E:[email protected]
http://www.eaglegenomics.com/
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l