At 05:27 PM 10/9/2007 -0700, Andi Vajda wrote:
Not throw out. Migrate to a new schema. Just like in a relational database.
If you change the low-level layout (format), core schema, or app
schema (table layout) someone needs to migrate the data. It might be
apparently easier in a relational schema but not so once you've
carefully optimized it and duplicated stuff left and right to get
the desired performance. Essentially, it becomes harder once the 1-1
correspondance between programmer's view (kind/class) and SQL table is broken.
Have a look at Hibernate, which is used by Cosmo: it uses an XML file
that specifies the mapping between objects and database. The
contents of this file are never known to the application, which
simply uses its own object model.
Hibernate maps object retrieval and queries to SQL, and applications
use either the collections defined by the mapping, or use "HQL",
which is an SQL-like query language that queries in terms of the
*object* schema, rather than the relational one. And it takes care
of all the non-1-1-ness in the mapping.
Now, if you add new types to the application schema, of course you
have to add to the XML file. But in principle you could generate the
XML in a logical fashion from the new piece of application schema, so
that even that step is not necessary when you are first adding to the
application.
Now, Hibernate is not available for Python (although I suppose you
could make it so with JCC!) but it illustrates the point that is
possible to separate things in this fashion. I believe there is at
least one Python ORM that claims to be inspired by or to work like
Hibernate, though. I also seem to recall that SQLAlchemy for Python
also has a great deal of flexibility in mapping between different
relational schemas, such that your code can deal with a logical
schema rather than an actual one.
There is also the possibility of just rolling Yet Another Python ORM,
perhaps based on EIM. But these things don't matte as much as
layering the application in such a way that it does not *care* how
things actually get stored. Chandler's domain model objects should
not be subclasses of a storage type, for example. (i.e., they should
not be repository.Items).
That way, we will be able to experiment with different mappings and
different back ends for optimum performance. For that matter, we
could use more than one back end if we chose, such that email bodies
might be stored in mbox files, while their headers get indexed in
SQLite. (While all being dumpable and reloadable, of course.)
And, it is likely that for some period, we will still back-end to the
repository -- we just would go through a mapping layer of some sort
first. (And that would mean that we could do some physical schema
tuning there, without needing to mess with the application layer.)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev