Irene Langkilde Geary wrote:
I have no doubt that the C++ code I'm using is as efficient as it could be. (I didn't write it, it was loaned to me by the inventer of ADTrees.) However, the current code is limited to features with a maximum arity of 50. Some of my features have arities on the order of 30,000-40,000 possible values. So I'm exploring how to adjust the structure of ADTrees to deal with high-arity attributes without losing their size/speed benefits. One possibility is to store only the top 50 most frequent values at any given level of the tree, plus have a 'misc.' catch-all value. This would be much more feasible than storing all 30,000 values, but it means that I might have many (thousands or 10s of thousands of unique record types (aka, arity lists). That's why I'm interested in understanding how much space it takes to store a record's arity. Is it linear in the number of attributes?
Record arities, just like dictionaries, are hash tables. Their size is linear in the number of attributes. Your description of the problem suggests to use dictionaries instead of records, at least for building the tree. Once built, it is possible to convert it to records (see Dictionary.toRecord).
My record arities are NOT dynamic in the sense that I will be doing Adjoin operations. However, I could possibly end up with a structure that has many different arities in it. And during construction I will be using variables to refer to features (ie. Tree.Feature). Doesn't this mean the arities will need to be around during runtime, not just compilation time? Once the tree is built, I would want to pickle it and reuse it later to lookup counts. I would not be changing it at all once it was fully constructed.
You can only pickle immutable data, hence no dictionaries. Pickling will require to convert each dictionary into a record, or a list of key-value pairs.
How much space do dictionaries take compared to records? Are they the same size? I thought I read in a past posting somewhere that dictionaries had linear access time to feature fields, rather than constant time access. That would be an important difference to me, because I do care a lot about lookup time efficiency. Is that true?
No, dictionaries are hash tables (using bucketing). Access to features has constant time complexity. The same complexity as records.
Cheers, raph _________________________________________________________________________________ mozart-users mailing list [email protected] http://www.mozart-oz.org/mailman/listinfo/mozart-users
