You can also get the hierarchy directly from the NCBI taxonomy dump...
this is in Groovy but gives you the idea:

HashMap<Integer, TreeNode> taxid2node = [:]
HashMap<Integer, Integer> child2parent = [:]

def nodePattern = ~/^(\d+)\t\|\t(\d+)\t\|\t(.+?)\t\|/


def count=0
new File("/home/martin/nodes.dmp").eachLine{
   line ->
   count++
   def matcher = (line =~ nodePattern)
   if (matcher.matches()){
         Integer myId = matcher[0][1].toInteger()
         Integer parentId = matcher[0][2].toInteger()
         String myRank = matcher[0][3]

         def node = new TreeNode(taxid : myId, rank:myRank)
         taxid2node[(myId)] = node

         child2parent[(myId)] = parentId

    }
}
// do something with the hash



-Martin



On 2 April 2010 08:38, Richard Holland <[email protected]> wrote:
> The parsers don't load the hiearachy from Genbank because it is redundant 
> information separately available from NCBI taxonomy. Also it tends to be 
> buggy and can differ between Genbank files for the same organism.
>
> If you want the hierarchy. you need to be using BioJava in conjunction with 
> BioSQL and load the NCBI taxonomy into your BioSQL instance ( 
> http://www.biojava.org/wiki/BioJava:BioJavaXDocs#NCBI_Taxonomy_data ), from 
> where BioJava can then retrieve it using the sample code you show in your 
> email.
>
> thanks,
> Richard
>
> On 2 Apr 2010, at 04:02, Huijie Qiao wrote:
>
>> version 1.7.1
>>
>> line 361
>> else if (sectionKey.equals(SOURCE_TAG)) {
>>      // ignore - can get all this from the first feature
>>
>> actually the content in the SOURCE_TAG and the first feature are different
>> in some gb file.
>>
>> For example, the example file in
>> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb
>>
>> The Source TAG is
>> SOURCE      Bos taurus (cattle)
>>  ORGANISM  Bos taurus
>>            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>> Euteleostomi;
>>            Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia;
>>            Pecora; Bovidae; Bovinae; Bos.
>>
>> and the first feature tag is
>> FEATURES             Location/Qualifiers
>>     source          1..1136
>>                     /organism="Bos taurus"
>>                     /mol_type="mRNA"
>>                     /db_xref="taxon:9913"
>>                     /clone="pBB2I"
>>                     /tissue_type="liver"
>>
>> I can't get the hierarchy info through the follow codes.
>> NCBITaxon taxon = seq.getTaxon();
>> System.out.println(taxon.getNameHierarchy()); output is "."
>> _______________________________________________
>> Biojava-l mailing list  -  [email protected]
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: [email protected]
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> Biojava-l mailing list  -  [email protected]
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>

_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to