At 11:08 AM 10/9/2007 -0700, Philippe Bossut wrote:
On the idea of having a "small group attack[ing] architecture on the
side", my concern is to make sure that the progress of this work is
correctly tracked and focus on the right thing. My experience with
this kind of project (2 in my past life as a manager and both
unsuccessful) makes me prudent and wary: it's easy for the
"rearchitecture guys" and the "product maintenance guys" to be so
disconnected that the 2 projects suffer and fail, the
"rearchitecture project" going on a wild goose chase with grandiose
objectives, the "product maintenance" seeing its immediate needs
(say testability or performance) not addressed in any foreseeable
future and growing downright negative on the rearchitecture. This is
a very serious risk.
In a vacuum, it would certainly be reasonable to be wary of such
projects. However, it should also be pointed out that in my time
serving OSAF, we've successfully completed no less than four
rearchitecture projects under my guidance, including the removal of
parcel XML, the transition to "stamping as annotations", EIM-based
sharing, replacing parcel discovery with eggs, replacing the old
timer system with osaf.startup, and others.
All of these (not to mention the various greenfield architecture
projects I worked on) were completed on or ahead of schedule, with
high approval ratings for the results -- even from people who at
first thought a particular project was unfeasible, unnecessary, or
just a bad idea.
So, I think it's only fair to match your experience with two
unsuccessful projects in other environments than this one, with my
experience of 4+ successful ones in this environment.
Now, that's not to say that this effort can't fail -- anything can
fail, of course. It's just that where Chandler is concerned, I
haven't done so yet, and don't intend to start now. :) This is not
braggadocio on my part, as it is not really a question of Chandler
needing an especially *good* or "elegant" architecture; it would do
quite well with a mediocre one -- as long as it was one reasonably
*appropriate* to the tasks Chandler actually needs to perform.
Unfortunately, where Chandler has had any architecture at all (e.g.
CPIA and the repository), it has typically been aimed at building an
entirely different sort of application than the one we ended up developing!
Thus, Chandler's performance, reliability, code size, and testability
have all been burdened by five-year-old assumptions about goals we
have long ago stopped chasing. It is time to stop paying (and
paying, and paying) for that legacy.
To mitigate that risk we'll need to get a monthly formal status
point, reviewing what has been learned in that month, what's the
next action and how more clarity the recent progress gives us on the
overall schedule of the project. The rearchitecture project will
have to be very open so that the rest of the team stay engaged and
on board with the new architecture. We also have to have enough
visibility that, if the goals seem to slip further away every
months, we should be able to call the project off and cut our
losses. This is what could be called an "accountability" clause for
the rearchitecture project.
Of course, accountability is a must. I myself would like to see a
demo-capable version of Chandler on the new architecture (minus
certain features such as sharing and email) by year-end, that offers
significantly improved memory footprint, startup time, and UI
responsiveness compared to its big brother.
That is, the improvements in the product should be visible to an end
user, not just a developer. This is key for PR and funding reasons,
to answer the inevitable (and misguided) "why are you rewriting"
questions in situations where a nuanced reply won't be anywhere near
as convincing as a side-by-side comparison.
If it works, we have much to gain, and if it fails, we lose only the
time spent on the pilot by a limited group of people.
Testability
-----------
PJE: Testability is a requirement for the goals laid out in Katie's
original email, and is therefore a "wedge issue" for the whole
discussion. Testability is also necessary to build a developer community.
Morgen: Testability has been vital for development and ongoing
maintenance and feature development of syncing code.
Heikki: Testability has not been a requirement in the development
of a well-known family of open source web browers whose process he
happens to be familiar with.
There was some discussion of testability in the thread following
Aparna's "Desktop Test Automation Project" email. In the standard
layered approach, we could tackle testability of presentation layer
code with a mock wx.
I can't pull myself and say that testability in and on itself should
be the driver. IOW, if testability was our only issue with the
architecture, we couldn't justify overhauling the architecture just
for that and would rather think about other approaches to test (test
by community, use off the shelf testing frameworks, etc...).
One must look beyond the surface meaning of the word "testability" to
see what I mean. A component that cannot be unit tested is a
component that has undesirable couplings to globals or other components.
Undesirable coupling, in turn, means that a unit cannot be worked on
separately, nor can it be relied upon as a solid base for development
of other components. A unit that can't be treated as a black box
becomes a development bottleneck, as progress relies on a limited
number of "experts" whose breadth and depth of knowledge is
substituted for adequate tests and documentation.
So, there is a lot more to the consequences on the project than the
mere literal meaning of being able to test something. The lack of
testability is the direct cause of many systemic project issues, as
previously mentioned.
CPIA/Persistence in the UI
--------------------------
Katie's original email suggested that we shouldn't be persisting so
much of the UI structure of the app. John noted that transparent
persistence was a refreshing contrast to other systems he'd worked
on, where you had to write SQL queries every time you needed to
persist something. Philippe pointed out that having everything be
persistent is confusing for new developers: it's hard to make sense
of all the attributes that get tied to even simple items when
displayed in the UI.
PJE: What we need is to separate out visual presentation (not
persistent) from application logic (e.g. which items are selected
in which collections, etc). That leads to greater testability (you
can test the application logic without the UI). Ideally, you could
separate out persistence as well, which means you can run tests of
the application without the repository. Greater separation means
more opportunity for parallel development (i.e. of views vs
interaction model vs persistence) of features.
*Agreement* (Philippe, PJE, Reid, Andi, John, Mikeal): Cleaner
separation of UI from the rest of the app.
*Agreement*: (Philippe, PJE, Andi): Not persisting
redundant/constant UI data.
*Agreement* (John, PJE): The current template mechanism in Blocks
is confusing and mostly a historical artifact, so it should be removed.
The CPIA remnants and what Reid described as the "wall of
abstraction" must go. If we could organize the rearchitecture
project so that this part was done first and merged with the trunk
before the rest (full testability, performance and scalability), I'd
vote for that.
Key to the approach that Grant and I envision, is that we are first
and foremost *removing* code, rather than changing it
in-place. After all, if we are changing existing code that does not
have tests, how do we know whether it's working? So, it has to be
done test-first, which means starting with no code, and adding
tests. Once a test exists, only then is it safe to add in code. So
in truth, the "walls of abstraction" in CPIA and the repository would
be gone the moment the pilot project begins. :) (That is, they will
be gone by simple virtue of not adding them to the branch.)
However, there would be no merging to the trunk. Instead, any new
tests added on the trunk (and where applicable, the implementation of
the corresponding fixes and feature additions) migrate to the branch,
to keep it up to date. Then, when the time is right, the branch
simply replaces the trunk.
In fact, calling it a branch is a bit of a misnomer, as even if we
put it under branches/ in SVN, it'll likely be started with an empty
directory, rather than a copy of the trunk (although where possible,
we'll "svn cp" in files as they become useful, so that revision
history remains.)
Performance, Scalability
------------------------
*Open Issue*: (Grant, Andi, Brian K) Unclear what the goals are
w.r.t. email beyond what we have today. Need measurements of how
Chandler performs in the presence of many items (and/or
collections), as well as explicit performance goals.
As an objective, we should aim at handling thousands of items in
Chandler: emails, events, tasks, snippets of all kind (see Katie's
email in that thread). So far though, it doesn't seem to me the most
urgent issue, as I feel the one listed previously (CPIA architecture
obscurity) is a road block on any effort. Also, one could think
about addressing performance and scalability at the repo level,
without changing the whole architecture.
One could think about it, but one would be unlikely to make much
headway. :) As Andi says, the repository has come a long way, but
this has mostly been through micro-optimization rather than looking
at the overall approach, and we are running out of things to
micro-optimize. Any major improvement will have to involve more than
just the repository (as Andi has also pointed out).
Unfortunately, the overall approach we are using with the repository
is actually an anti-pattern for this type of application, in my
experience. In fact, we have what might be called anti-patterns all
the way down, including:
1. application-level code meddling in storage-level details
2. lack of sufficient domain-specific query APIs
3. no indirection between the application's logical schema and its
physical storage schema
4. implementing a generic database inside another generic database
5. implementing generic indexes inside of generic indexes
6. reimplementing all the guts of a relational database in Python,
only without getting any of the benefits of actually using a
relational data model (such as query transparency, or index/table
definition independence at the application layer), or the
performance/maintenance benefits of using an RDBMS written in C and
maintained by someone else.
Note in particular, that getting rid of #6 eliminates #4 and #5,
while making the other three a lot easier to fix.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev