At 11:08 AM 10/9/2007 -0700, Philippe Bossut wrote:
On the idea of having a "small group attack[ing] architecture on the side", my concern is to make sure that the progress of this work is correctly tracked and focus on the right thing. My experience with this kind of project (2 in my past life as a manager and both unsuccessful) makes me prudent and wary: it's easy for the "rearchitecture guys" and the "product maintenance guys" to be so disconnected that the 2 projects suffer and fail, the "rearchitecture project" going on a wild goose chase with grandiose objectives, the "product maintenance" seeing its immediate needs (say testability or performance) not addressed in any foreseeable future and growing downright negative on the rearchitecture. This is a very serious risk.

In a vacuum, it would certainly be reasonable to be wary of such projects. However, it should also be pointed out that in my time serving OSAF, we've successfully completed no less than four rearchitecture projects under my guidance, including the removal of parcel XML, the transition to "stamping as annotations", EIM-based sharing, replacing parcel discovery with eggs, replacing the old timer system with osaf.startup, and others.

All of these (not to mention the various greenfield architecture projects I worked on) were completed on or ahead of schedule, with high approval ratings for the results -- even from people who at first thought a particular project was unfeasible, unnecessary, or just a bad idea.

So, I think it's only fair to match your experience with two unsuccessful projects in other environments than this one, with my experience of 4+ successful ones in this environment.

Now, that's not to say that this effort can't fail -- anything can fail, of course. It's just that where Chandler is concerned, I haven't done so yet, and don't intend to start now. :) This is not braggadocio on my part, as it is not really a question of Chandler needing an especially *good* or "elegant" architecture; it would do quite well with a mediocre one -- as long as it was one reasonably *appropriate* to the tasks Chandler actually needs to perform.

Unfortunately, where Chandler has had any architecture at all (e.g. CPIA and the repository), it has typically been aimed at building an entirely different sort of application than the one we ended up developing!

Thus, Chandler's performance, reliability, code size, and testability have all been burdened by five-year-old assumptions about goals we have long ago stopped chasing. It is time to stop paying (and paying, and paying) for that legacy.


To mitigate that risk we'll need to get a monthly formal status point, reviewing what has been learned in that month, what's the next action and how more clarity the recent progress gives us on the overall schedule of the project. The rearchitecture project will have to be very open so that the rest of the team stay engaged and on board with the new architecture. We also have to have enough visibility that, if the goals seem to slip further away every months, we should be able to call the project off and cut our losses. This is what could be called an "accountability" clause for the rearchitecture project.

Of course, accountability is a must. I myself would like to see a demo-capable version of Chandler on the new architecture (minus certain features such as sharing and email) by year-end, that offers significantly improved memory footprint, startup time, and UI responsiveness compared to its big brother.

That is, the improvements in the product should be visible to an end user, not just a developer. This is key for PR and funding reasons, to answer the inevitable (and misguided) "why are you rewriting" questions in situations where a nuanced reply won't be anywhere near as convincing as a side-by-side comparison.

If it works, we have much to gain, and if it fails, we lose only the time spent on the pilot by a limited group of people.



Testability
-----------
PJE: Testability is a requirement for the goals laid out in Katie's original email, and is therefore a "wedge issue" for the whole discussion. Testability is also necessary to build a developer community. Morgen: Testability has been vital for development and ongoing maintenance and feature development of syncing code. Heikki: Testability has not been a requirement in the development of a well-known family of open source web browers whose process he happens to be familiar with. There was some discussion of testability in the thread following Aparna's "Desktop Test Automation Project" email. In the standard layered approach, we could tackle testability of presentation layer code with a mock wx.

I can't pull myself and say that testability in and on itself should be the driver. IOW, if testability was our only issue with the architecture, we couldn't justify overhauling the architecture just for that and would rather think about other approaches to test (test by community, use off the shelf testing frameworks, etc...).

One must look beyond the surface meaning of the word "testability" to see what I mean. A component that cannot be unit tested is a component that has undesirable couplings to globals or other components.

Undesirable coupling, in turn, means that a unit cannot be worked on separately, nor can it be relied upon as a solid base for development of other components. A unit that can't be treated as a black box becomes a development bottleneck, as progress relies on a limited number of "experts" whose breadth and depth of knowledge is substituted for adequate tests and documentation.

So, there is a lot more to the consequences on the project than the mere literal meaning of being able to test something. The lack of testability is the direct cause of many systemic project issues, as previously mentioned.


CPIA/Persistence in the UI
--------------------------

Katie's original email suggested that we shouldn't be persisting so much of the UI structure of the app. John noted that transparent persistence was a refreshing contrast to other systems he'd worked on, where you had to write SQL queries every time you needed to persist something. Philippe pointed out that having everything be persistent is confusing for new developers: it's hard to make sense of all the attributes that get tied to even simple items when displayed in the UI.

PJE: What we need is to separate out visual presentation (not persistent) from application logic (e.g. which items are selected in which collections, etc). That leads to greater testability (you can test the application logic without the UI). Ideally, you could separate out persistence as well, which means you can run tests of the application without the repository. Greater separation means more opportunity for parallel development (i.e. of views vs interaction model vs persistence) of features.

*Agreement* (Philippe, PJE, Reid, Andi, John, Mikeal): Cleaner separation of UI from the rest of the app. *Agreement*: (Philippe, PJE, Andi): Not persisting redundant/constant UI data. *Agreement* (John, PJE): The current template mechanism in Blocks is confusing and mostly a historical artifact, so it should be removed.

The CPIA remnants and what Reid described as the "wall of abstraction" must go. If we could organize the rearchitecture project so that this part was done first and merged with the trunk before the rest (full testability, performance and scalability), I'd vote for that.

Key to the approach that Grant and I envision, is that we are first and foremost *removing* code, rather than changing it in-place. After all, if we are changing existing code that does not have tests, how do we know whether it's working? So, it has to be done test-first, which means starting with no code, and adding tests. Once a test exists, only then is it safe to add in code. So in truth, the "walls of abstraction" in CPIA and the repository would be gone the moment the pilot project begins. :) (That is, they will be gone by simple virtue of not adding them to the branch.)

However, there would be no merging to the trunk. Instead, any new tests added on the trunk (and where applicable, the implementation of the corresponding fixes and feature additions) migrate to the branch, to keep it up to date. Then, when the time is right, the branch simply replaces the trunk.

In fact, calling it a branch is a bit of a misnomer, as even if we put it under branches/ in SVN, it'll likely be started with an empty directory, rather than a copy of the trunk (although where possible, we'll "svn cp" in files as they become useful, so that revision history remains.)


Performance, Scalability
------------------------

*Open Issue*: (Grant, Andi, Brian K) Unclear what the goals are w.r.t. email beyond what we have today. Need measurements of how Chandler performs in the presence of many items (and/or collections), as well as explicit performance goals.

As an objective, we should aim at handling thousands of items in Chandler: emails, events, tasks, snippets of all kind (see Katie's email in that thread). So far though, it doesn't seem to me the most urgent issue, as I feel the one listed previously (CPIA architecture obscurity) is a road block on any effort. Also, one could think about addressing performance and scalability at the repo level, without changing the whole architecture.

One could think about it, but one would be unlikely to make much headway. :) As Andi says, the repository has come a long way, but this has mostly been through micro-optimization rather than looking at the overall approach, and we are running out of things to micro-optimize. Any major improvement will have to involve more than just the repository (as Andi has also pointed out).

Unfortunately, the overall approach we are using with the repository is actually an anti-pattern for this type of application, in my experience. In fact, we have what might be called anti-patterns all the way down, including:

1. application-level code meddling in storage-level details

2. lack of sufficient domain-specific query APIs

3. no indirection between the application's logical schema and its physical storage schema

4. implementing a generic database inside another generic database

5. implementing generic indexes inside of generic indexes

6. reimplementing all the guts of a relational database in Python, only without getting any of the benefits of actually using a relational data model (such as query transparency, or index/table definition independence at the application layer), or the performance/maintenance benefits of using an RDBMS written in C and maintained by someone else.

Note in particular, that getting rid of #6 eliminates #4 and #5, while making the other three a lot easier to fix.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Reply via email to