At 01:41 PM 1/15/01 +1100, Martin Sevior wrote:
>On Sun, 14 Jan 2001, John L. Clark wrote:
>> Is our interface to document properties really done solely by lookup
tables
>> of strings? If so, why is it not done instead like a database, with IDs
>> for properties and their values, which map to strings when necessary for
>> writing? I'm still pouring over our piecetable and surrounding
>> structure code, and it will be a while until I am at all comfortable
>> with it, so forgive my naivety.
>
>These are good questions. Yes our interface to doc properties is all
>though strings.
>
>Dom recently committed code to do binary searches on string properties but
>we have also considered using enums for doc properties too. I'm not sure
>why the abi designers went with const strings over enums. enums would
>certainly be much faster. I guess they thought that using const strings
>would be more robust and perhaps more easily interfaced to XML parsers.
I don't want to open up a flame war, but here's the history...
Perhaps we've seen too many network protocols in our lives, but Jeff and I
made that design decision. (IIRC, Eric was the first to propose converting
to enums, but he neither wrote the necessary code nor convinced us to do so.)
I think the real reason we've been happy with the strings all along is that
they're self-documenting, robust, and quite scalable. Adding a new property
requires new code only on the edges -- all of the core property-handling
logic down in the piece table and our importer/exporter stays the same.
Admittedly, this design favors our native file format at the expense of
others, but that's a feature, not a bug. ;-) It sure does wonders for our
ability to support forwards and backwards compatibility in our file format.
The two major weaknesses of this decision are:
1. The properties aren't documented well enough (outside of the property
parsing logic in the bowels of the code). This is a bug.
2. Any form of tokenization would be faster. In fact, that's why Jeff did
so much work to condense attrprops down in the piece table. However (to
borrow a page from Thomas' book), I haven't seen any profiling results since
then which suggest that's where we really need more speed.
By contrast, if we introduced a translation layer to enums (or whatever),
I'm not sure that the gains would be all that worthwhile. It introduces a
level of API complexity which just feels wrong. Consider the code you've
seen to handle subsequent versions of the following:
- binary vs. text file formats
- binary vs. text network protocols
In both cases, any performance gains tend to be dominated by the complexity
and brittleness of the code needed to implement them.
Paul