Re: [EMBOSS] dbxflat and size of index files

ajb Wed, 31 Oct 2007 14:14:08 -0800

Hello Jérôme,

Yes, it is normal. It is a combination of three things. First, it is a
tree structure, secondly the tree isn't tightly packed and thirdly
64-bit pointers are used throughout. The first will
allow on-the-fly updating of the index, the second is for speed of
construction/updating and the third is obvious. Another
consideration is that, in some cases, the indexes are trees-of-trees
to allow duplicate codes to be indexed (e.g. keywords).


Coincidentally I'm on the lookout for new indexing algorithms at the
moment so, if you have a favourite one then we're always open
for suggestions.

Alan


> Hello,
>
> I use dbxflat to index uniprot (sprot and trembl) flat files for
> which the size is 1.2 G for sprot and 11 G for trembl. The resulting
> files are amazingly huge: 11 G. Is it normal?
>
> Another example with Genbank flat files: the division gbsts has a
> size of 3.3 G. Indexing with dbxflat give 6.8 G of index files but
> with dbiflat give only 199 M of index files. I know its not necessary
> to index genbank flat files with dbxflat because each individual file
> is not bigger than 300 M. I did this just for the demonstration.
>
> Apart of this, all is working very well.
>
> Thank you in advance.
>
>
> Jérôme Laroche
>
> Centre de bioinformatique et de biologie computationnelle
> Université Laval
>
>
> _______________________________________________
> EMBOSS mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/emboss
>


_______________________________________________
EMBOSS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/emboss

Re: [EMBOSS] dbxflat and size of index files

Reply via email to