Hello Jérôme, Yes, it is normal. It is a combination of three things. First, it is a tree structure, secondly the tree isn't tightly packed and thirdly 64-bit pointers are used throughout. The first will allow on-the-fly updating of the index, the second is for speed of construction/updating and the third is obvious. Another consideration is that, in some cases, the indexes are trees-of-trees to allow duplicate codes to be indexed (e.g. keywords).
Coincidentally I'm on the lookout for new indexing algorithms at the moment so, if you have a favourite one then we're always open for suggestions. Alan > Hello, > > I use dbxflat to index uniprot (sprot and trembl) flat files for > which the size is 1.2 G for sprot and 11 G for trembl. The resulting > files are amazingly huge: 11 G. Is it normal? > > Another example with Genbank flat files: the division gbsts has a > size of 3.3 G. Indexing with dbxflat give 6.8 G of index files but > with dbiflat give only 199 M of index files. I know its not necessary > to index genbank flat files with dbxflat because each individual file > is not bigger than 300 M. I did this just for the demonstration. > > Apart of this, all is working very well. > > Thank you in advance. > > > Jérôme Laroche > > Centre de bioinformatique et de biologie computationnelle > Université Laval > > > _______________________________________________ > EMBOSS mailing list > [email protected] > http://lists.open-bio.org/mailman/listinfo/emboss > _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
