Hi, all
This is a summary of the thinking behind a change I recently
committed to iCalendar import:
<http://viewcvs.osafoundation.org/chandler/?rev=14887&view=rev>
that speeds up the 3000 event calendar import test case by 35%-40%.
There are similar gains to be had in other performance scenarios
(like reload, and possibly subscribe), but there's some trickiness
involved, so it's good to document things.
The main part of the diff (in parcels/osaf/sharing/stateless.py) is
to wrap the import code with a "with repoView.reindexingDeferred():"
call. It turns out that before this change, we were spending an
enormous amount of time reinserting items in indexes, as a result of
setting attributes that could affect the various indexes Chandler uses.
Currently, there are two usage patterns for repository indexes in
Chandler:
1) Indexes used to make sure items are unique: The cases I know of
are the EmailAddress and Location kinds. We don't want to create a
new item every time you address an item a given email address, so we
index the collection of all EmailAddress items, and use the index
(actually, multiple indexes) to use an existing item if possible when
you add or import an email address.
2) Indexes used for sorting or searching in the UI: Examples here are
the indexes used for sorting on dashboard column, and also the global
startTime-related indexes used by the calendar UI to find all the
relevant events for a given week/day.
It turns out that it's OK to defer the indexes in #2 above for import
(or reload, which is similar): the UI is already being notified of
changes to the items it's displaying, so we don't need to keep all
the indexes instantaneously up-to-date.
However, in case #1, deferring indexing often leads to errors that
look like:
LookupError: Access to skiplist is denied, it is marked INVALID
because the deferring has left the index in a temporarily
inconsistent state, but we're trying to iterate/insert into the
index. So, for case #1, Andi added a 'nodefer' keyword argument to
the createIndex() call, which means that these indexes will always
keep themselves consistent (i.e. essentially ignore reindexingDeferred
()). This allows us to defer indexing for the remaining indexes,
which, happily, is where the most time was previously wasted.
--Grant
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev