My solution is to download the taxonomy files from Genebank, which contain the information of the taxonomy numbers for all GI numbers and the hierarchical taxonomy tree structure. You can write a program to partition the protein NR file into separated files/folders, each belonging to a specific taxonomy number that is a descendant of the eukaryote node in the taxonomy tree.
The location of the Genbank taxonomy files is ftp://ftp.ncbi.nih.gov/pub/taxonomy/ _______________________________________________ BBB mailing list [email protected] http://www.bioinformatics.org/mailman/listinfo/bbb
