On Tue, 9 Oct 2007, Grant Baillie wrote:
Also, one could think about addressing performance and scalability at the
repo level, without changing the whole architecture.
While Chandler was developed with an infinitely scalable and infinitely
fast repository in mind, it might be time to let reality sink in. The
repository has come a long way in terms of performance and could still be
improved, for sure, but coulddn't one think about addressing performance
and scalability at the app level as well, without changing the whole
repository architecture ?
Well, one can think about anything, so sure :). But as things stand, there
isn't really an "app level" to speak of: The repository is intertwined with
everything, and its API shapes the app layer in ways that aren't always so
effective. (The current indexing situation is one concrete example).
In other words, it's up to the app to dis-intertwine itself from the
repository. I don't think that just tackling repository performance in
isolation as has been the approach until now is the right solution anymore.
If, for instance, when importing 100,000 mail message we tell the UI about
every itsy bitsy change one attribute at a time, no amount of repository
performance improvements is going to get us to the performance we expect.
About mail import performance I need to point out that the message in the
status bar at the bottom of the UI is misleading. It says "committing <n>
messages" implying that it's spending time inserting item records into the
repository.
Since this conversation is now in the mode where we're throwing around row
insert number timings, how about changing the message to saying something like
"converting mail messages to chandler items" ? The actual repo insert part,
the repo commit(), part is pretty small, even negligible, when compared to the
time spent "chandlerizing" the mail messages into items with a live UI. I sure
don't want people to think that it takes half an hour to write 7,000 mail
message items into the repository.
Earlier today, Heikki proposed using multiple processes to better take
advantage of multi-core hardware. Berkeley DB and the Chandler repository
already fully support multiple processes accessing the same repository
concurrently. It should be fairly easy for the application to split off some
tasks into separate processes without any code changes in the task or
repository components themselves. Importing a large amount of mail in a
different process or background syncing collections in a different process
could yield some interesting results.
Andi..
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev