Re: space efficiency of records

Raphael Collet Wed, 28 Sep 2005 07:03:56 -0700

Irene Langkilde Geary wrote:

I have no doubt that the C++ code I'm using is as efficient as it couldbe. (I didn't write it, it was loaned to me by the inventer ofADTrees.) However, the current code is limited to features with amaximum arity of 50. Some of my features have arities on the order of30,000-40,000 possible values. So I'm exploring how to adjust thestructure of ADTrees to deal with high-arity attributes without losingtheir size/speed benefits.One possibility is to store only the top 50 most frequent values at anygiven level of the tree, plus have a 'misc.' catch-all value. Thiswould be much more feasible than storing all 30,000 values, but it meansthat I might have many (thousands or 10s of thousands of unique recordtypes (aka, arity lists). That's why I'm interested in understandinghow much space it takes to store a record's arity. Is it linear in thenumber of attributes?

Record arities, just like dictionaries, are hash tables. Their size islinear in the number of attributes. Your description of the problemsuggests to use dictionaries instead of records, at least for buildingthe tree. Once built, it is possible to convert it to records (seeDictionary.toRecord).

My record arities are NOT dynamic in the sense that I will be doingAdjoin operations. However, I could possibly end up with a structurethat has many different arities in it. And during construction I willbe using variables to refer to features (ie. Tree.Feature). Doesn'tthis mean the arities will need to be around during runtime, not justcompilation time? Once the tree is built, I would want to pickle it andreuse it later to lookup counts. I would not be changing it at all onceit was fully constructed.

You can only pickle immutable data, hence no dictionaries. Picklingwill require to convert each dictionary into a record, or a list ofkey-value pairs.

How much space do dictionaries take compared to records? Are they thesame size? I thought I read in a past posting somewhere thatdictionaries had linear access time to feature fields, rather thanconstant time access. That would be an important difference to me,because I do care a lot about lookup time efficiency. Is that true?

No, dictionaries are hash tables (using bucketing). Access to featureshas constant time complexity. The same complexity as records.


Cheers,
raph

_________________________________________________________________________________
mozart-users mailing list                               
[email protected]
http://www.mozart-oz.org/mailman/listinfo/mozart-users

Re: space efficiency of records

Reply via email to