You can also get the hierarchy directly from the NCBI taxonomy dump...
this is in Groovy but gives you the idea:
HashMap<Integer, TreeNode> taxid2node = [:]
HashMap<Integer, Integer> child2parent = [:]
def nodePattern = ~/^(\d+)\t\|\t(\d+)\t\|\t(.+?)\t\|/
def count=0
new File("/home/martin/nodes.dmp").eachLine{
line ->
count++
def matcher = (line =~ nodePattern)
if (matcher.matches()){
Integer myId = matcher[0][1].toInteger()
Integer parentId = matcher[0][2].toInteger()
String myRank = matcher[0][3]
def node = new TreeNode(taxid : myId, rank:myRank)
taxid2node[(myId)] = node
child2parent[(myId)] = parentId
}
}
// do something with the hash
-Martin
On 2 April 2010 08:38, Richard Holland <[email protected]> wrote:
> The parsers don't load the hiearachy from Genbank because it is redundant
> information separately available from NCBI taxonomy. Also it tends to be
> buggy and can differ between Genbank files for the same organism.
>
> If you want the hierarchy. you need to be using BioJava in conjunction with
> BioSQL and load the NCBI taxonomy into your BioSQL instance (
> http://www.biojava.org/wiki/BioJava:BioJavaXDocs#NCBI_Taxonomy_data ), from
> where BioJava can then retrieve it using the sample code you show in your
> email.
>
> thanks,
> Richard
>
> On 2 Apr 2010, at 04:02, Huijie Qiao wrote:
>
>> version 1.7.1
>>
>> line 361
>> else if (sectionKey.equals(SOURCE_TAG)) {
>> // ignore - can get all this from the first feature
>>
>> actually the content in the SOURCE_TAG and the first feature are different
>> in some gb file.
>>
>> For example, the example file in
>> http://eutils.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html
>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=5&rettype=gb
>>
>> The Source TAG is
>> SOURCE Bos taurus (cattle)
>> ORGANISM Bos taurus
>> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>> Euteleostomi;
>> Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia;
>> Pecora; Bovidae; Bovinae; Bos.
>>
>> and the first feature tag is
>> FEATURES Location/Qualifiers
>> source 1..1136
>> /organism="Bos taurus"
>> /mol_type="mRNA"
>> /db_xref="taxon:9913"
>> /clone="pBB2I"
>> /tissue_type="liver"
>>
>> I can't get the hierarchy info through the follow codes.
>> NCBITaxon taxon = seq.getTaxon();
>> System.out.println(taxon.getNameHierarchy()); output is "."
>> _______________________________________________
>> Biojava-l mailing list - [email protected]
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
> --
> Richard Holland, BSc MBCS
> Operations and Delivery Director, Eagle Genomics Ltd
> T: +44 (0)1223 654481 ext 3 | E: [email protected]
> http://www.eaglegenomics.com/
>
>
> _______________________________________________
> Biojava-l mailing list - [email protected]
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l