On Tue, 9 Oct 2007, Phillip J. Eby wrote:

At 04:12 PM 10/9/2007 -0700, Andi Vajda wrote:

On Tue, 9 Oct 2007, Phillip J. Eby wrote:

1. application-level code meddling in storage-level details

Could you give some examples ?

Any place where the application is creating collections or working with indexes in order to achieve performance compared to "naive" iteration or queries.

I see. Creating a collection is like creating a query. In the relational world you propose that the app not write queries ?

On indexes I see your point, I think. An index is a query's cache. Not something one would want to expose to the app. Funnily, indexes _became_ this later, they started as a way to access collections by row number for displaying in the UI. Then later they became a query's cache (when we implemented abstract sets and collections) and then they were used as a way of persisting sort order.

In any case, I don't see how this is different in a relational model.
Once you work on extracting performance from a relational app, you end-up writing hardcoded queries that have very specific app knowledge.

But maybe this thread isn't about relational vs object - as I'm afraid it is - but perhaps about better app layering ?

2. lack of sufficient domain-specific query APIs

Again, please give an example of what you'd like ?

This isn't a repository problem - it's a domain-layer problem. If the places where we're doing #1 were at least consolidated to single points of reference, #1 wouldn't be so bad.

I think the app has done a pretty good job at moving a lot of the index maintenance code to a specific area. I'm thinking of the dashboard indexes here.

3. no indirection between the application's logical schema and its physical storage schema

Seems incorrect. I can change the physical storage schema (core schema or even repo format) without affecting app code. Or am I misunderstanding something ?

Sorry, I am using the relational meaning of logical and physical. A logical schema does not include indexes or views, while a physical schema does. I'm also extending this to refer to the lack of distinction between our preferred form of data as encapsulated objects, versus the best divisions of data from a performance point of view.

In chandler we've had for a long time the distinction between capital 'I' Items and lowercase 'i' items. This distinction has most materialized with the dump/reload/eim work which is a way to export 'I' Items. The repository deals with 'i' items on the other hand. Isn't this equivalent to what you're talking about ?

As for indexes, yes you're correct. They're not part of the logical schema. They're performance implementation details that are chosen by the app just like in a relational app where the app has to ultimately know about table layout, keys, indexes, put kludges into stored procedures, to make efficient queries.

The core schema and repo format aren't a factor in this, as they're at an even lower level than the "physical" schema I'm talking about. In the repository today, the "physical" schema consists of whatever sets/collections and indexes you create, which is rather analagous to creating indexes or materialized views in an RDBMS, only without the same transparency. In an RDBMS, if you add an index or a materialized view, it doesn't change how you retrieve your data: it just goes faster. So you can do application specific tuning without changing your application.

Same with the repository. It just goes faster. You don't have to change the way you access your data once you've created indexes. Except for random row number-based access for which I didn't dare writing the iterating APIs. But if you look in the collection code, it takes the slow route if it can't find an index and the fast route if it can for iteration, appartenance, etc... No need to change the access code at the app level to take advantage of the indexes. A repository index is a materialized view of a collection in relational terms.

4. implementing a generic database inside another generic database

That was the goal, originally.

Not quite; having a generic database was the goal, not that it be implemented *inside* another generic database. It is one thing to have a BerkeleyDB persistence layer driven by the application's dynamic schema, and another one altogether to implement a database on top of a fixed BerkeleyDB schema.

For comparison purposes, consider OpenLDAP: it is a generic, hierarchical, networked database implemented atop BerkeleyDB. However, instead of having a fixed schema for storing values, items, etc., in BerkeleyDB, it is dynamically extended as attribute types and indexes are added. So the database is *represented* in BerkeleyDB, rather than being implemented *inside* BerkeleyDB.

I think we disagree or misunderstand each other here. Or maybe I'm simply not following you. While it's not relational, the chandler repository has to go through the same hoops as OpenLDAP or MySQL to store anything in Berkeley DB. Berkeley DB can only store key/value pairs of byte string in b-trees, hashes, queues, and a fourth structure whose name escapes me at the moment.

So, when I say it is implemented "inside" another database, I mean it in the sense that the schema of the repository is not reflected in the schema of its back-end storage, and thus cannot fully utilize the back-end's features to maximum performance.

Can you give a specific example that would help me understand what you mean ?

I'm not sure what you mean by "hard compiled". Nothing stops us from having a relational schema that's extensible by parcels, or from doing so dynamically. In truth, the schemas we use with the repository today are no less "hard compiled". If we at some future time allow user-defined fields, there are still ways to represent them within such a relatively-static schema, or to simply modify the schema at runtime.

Once you've worked hard at extracting performance from your static schema, so that queries and joins are not too massive, any extension throws the (whole) effort into question over and over again. Any plugin developer will have to understand this. This was the main reason why we didn't choose this route five years ago. Maybe now we don't care anymore about this aspect as much.

For example, in conversations I've had with Grant, he compared Chandler with Mail.app and iCal.app which have such static schemas and can perform much better in their specific domains than more generic chandler.

If that's the route we'd like to take Chandler to, fine. That should be clearly stated. I'm not exactly against it either, just a lot less excited about it.

It'd be a different product, albeit with a lot of the same visible 0.7/1.0 features of today but a dead-end nonetheless. Chandler would only ever do what it's hardcoded to do (from a schema standpoint).

The last five years of work would be pretty much wasted, except for their "what not to do" aspect :)

5. implementing generic indexes inside of generic indexes

How so ? What are you thinking about ?

The skip list system is the main one I have in mind, but if I correctly understand how versions and values are stored, then those would be included too.

Yes, a skiplist implements the structure behind repository indexes. What is the 'generic indexes' that skiplist are implemented in you're talking about ?

Andi..
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Reply via email to