Re: [Chandler-dev] [Sum] The Great Architecture Discussion of 2007

Andi Vajda Tue, 09 Oct 2007 17:13:19 -0700


On Tue, 9 Oct 2007, Phillip J. Eby wrote:

At 04:12 PM 10/9/2007 -0700, Andi Vajda wrote:
On Tue, 9 Oct 2007, Phillip J. Eby wrote:
1. application-level code meddling in storage-level details
Could you give some examples ?
Any place where the application is creating collections or working withindexes in order to achieve performance compared to "naive" iteration orqueries.

I see. Creating a collection is like creating a query. In the relationalworld you propose that the app not write queries ?

On indexes I see your point, I think. An index is a query's cache. Notsomething one would want to expose to the app. Funnily, indexes _became_ thislater, they started as a way to access collections by row number fordisplaying in the UI. Then later they became a query's cache (when weimplemented abstract sets and collections) and then they were used as a way ofpersisting sort order.


In any case, I don't see how this is different in a relational model.

Once you work on extracting performance from a relational app, you end-upwriting hardcoded queries that have very specific app knowledge.

But maybe this thread isn't about relational vs object - as I'm afraid it is -but perhaps about better app layering ?

2. lack of sufficient domain-specific query APIs
Again, please give an example of what you'd like ?
This isn't a repository problem - it's a domain-layer problem. If the placeswhere we're doing #1 were at least consolidated to single points ofreference, #1 wouldn't be so bad.

I think the app has done a pretty good job at moving a lot of the indexmaintenance code to a specific area. I'm thinking of the dashboard indexeshere.

3. no indirection between the application's logical schema and itsphysical storage schema
Seems incorrect. I can change the physical storage schema (core schema oreven repo format) without affecting app code. Or am I misunderstandingsomething ?
Sorry, I am using the relational meaning of logical and physical. A logicalschema does not include indexes or views, while a physical schema does. I'malso extending this to refer to the lack of distinction between our preferredform of data as encapsulated objects, versus the best divisions of data froma performance point of view.

In chandler we've had for a long time the distinction between capital 'I'Items and lowercase 'i' items. This distinction has most materialized with thedump/reload/eim work which is a way to export 'I' Items. The repository dealswith 'i' items on the other hand. Isn't this equivalent to what you're talkingabout ?

As for indexes, yes you're correct. They're not part of the logical schema.They're performance implementation details that are chosen by the app justlike in a relational app where the app has to ultimately know about tablelayout, keys, indexes, put kludges into stored procedures, to make efficientqueries.

The core schema and repo format aren't a factor in this, as they're at aneven lower level than the "physical" schema I'm talking about. In therepository today, the "physical" schema consists of whatever sets/collectionsand indexes you create, which is rather analagous to creating indexes ormaterialized views in an RDBMS, only without the same transparency. In anRDBMS, if you add an index or a materialized view, it doesn't change how youretrieve your data: it just goes faster. So you can do application specifictuning without changing your application.

Same with the repository. It just goes faster. You don't have to change theway you access your data once you've created indexes. Except for random rownumber-based access for which I didn't dare writing the iterating APIs. But ifyou look in the collection code, it takes the slow route if it can't find anindex and the fast route if it can for iteration, appartenance, etc... No needto change the access code at the app level to take advantage of the indexes.A repository index is a materialized view of a collection in relational terms.

4. implementing a generic database inside another generic database
That was the goal, originally.
Not quite; having a generic database was the goal, not that it be implemented*inside* another generic database. It is one thing to have a BerkeleyDBpersistence layer driven by the application's dynamic schema, and another onealtogether to implement a database on top of a fixed BerkeleyDB schema.
For comparison purposes, consider OpenLDAP: it is a generic, hierarchical,networked database implemented atop BerkeleyDB. However, instead of having afixed schema for storing values, items, etc., in BerkeleyDB, it isdynamically extended as attribute types and indexes are added. So thedatabase is *represented* in BerkeleyDB, rather than being implemented*inside* BerkeleyDB.

I think we disagree or misunderstand each other here. Or maybe I'm simply notfollowing you. While it's not relational, the chandler repository has to gothrough the same hoops as OpenLDAP or MySQL to store anything in Berkeley DB.Berkeley DB can only store key/value pairs of byte string in b-trees, hashes,queues, and a fourth structure whose name escapes me at the moment.

So, when I say it is implemented "inside" another database, I mean it in thesense that the schema of the repository is not reflected in the schema of itsback-end storage, and thus cannot fully utilize the back-end's features tomaximum performance.


Can you give a specific example that would help me understand what you mean ?

I'm not sure what you mean by "hard compiled". Nothing stops us from havinga relational schema that's extensible by parcels, or from doing sodynamically. In truth, the schemas we use with the repository today are noless "hard compiled". If we at some future time allow user-defined fields,there are still ways to represent them within such a relatively-staticschema, or to simply modify the schema at runtime.

Once you've worked hard at extracting performance from your static schema, sothat queries and joins are not too massive, any extension throws the (whole)effort into question over and over again. Any plugin developer will have tounderstand this. This was the main reason why we didn't choose this route fiveyears ago. Maybe now we don't care anymore about this aspect as much.

For example, in conversations I've had with Grant, he compared Chandler withMail.app and iCal.app which have such static schemas and can perform muchbetter in their specific domains than more generic chandler.

If that's the route we'd like to take Chandler to, fine. That should beclearly stated. I'm not exactly against it either, just a lot less excitedabout it.

It'd be a different product, albeit with a lot of the same visible 0.7/1.0features of today but a dead-end nonetheless. Chandler would only ever do whatit's hardcoded to do (from a schema standpoint).

The last five years of work would be pretty much wasted, except for their"what not to do" aspect :)

5. implementing generic indexes inside of generic indexes
How so ? What are you thinking about ?
The skip list system is the main one I have in mind, but if I correctlyunderstand how versions and values are stored, then those would be includedtoo.

Yes, a skiplist implements the structure behind repository indexes. What isthe 'generic indexes' that skiplist are implemented in you're talking about ?


Andi..
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Re: [Chandler-dev] [Sum] The Great Architecture Discussion of 2007

Reply via email to