Re: [Chandler-dev] [Sum] The Great Architecture Discussion of 2007

Phillip J. Eby Wed, 10 Oct 2007 10:05:14 -0700

At 08:11 AM 10/10/2007 -0700, Andi Vajda wrote:

On Wed, 10 Oct 2007, Phillip J. Eby wrote:
I mean that by implementing a skiplist *inside* of BerkeleyDBrather than using a native BerkeleyDB structure, we're adding an"interpretation" layer there.
What is the native Berkeley DB structure that corresponds to a skiplist ?
A BTree. And the native Python equivalent is a list. You mayrecall that at one time, I compared the performance of the skiplistimplementation with a simple Python list object, and found that forlists of up to 50,000 items, Python lists were faster -- oftensignificantly faster. This is another good example of where we arereinventing wheels in Python code that somebody already wrote amature implementation of in C.
I don't see how a Berkeley DB B-Tree gives me positional access. Nordo I see how I can sparsely load and persist a Python list.

Berkeley lets you access portions of a record, does it not? You canread portions of a BTree into a positional list or an emulation ofone, can you not? As far as I am aware, there is nothing in ourrequirements that requires truly *random* positional access within an index.

We use positional access for displaying lists of things, and suchdisplay is essentially an indexed sequential access, with positionalaccess being needed within a particular window of offsets. In otherwords, we don't randomly choose to access item #1 now, and then #1000-- we have strong locality or clustering of access. A simplesliding-window cache over a BTree could perhaps be sufficient.

This is a good example of the sort of thing that happens when anarchitecture is created before the practical requirements are well-understood.

To be clear: I am not saying you made a bad choice. I am saying therequirements that were given to you at the time were eitherincomplete or misleading.

I am also not proposing that we necessarily go back and revisit thischoice in the near future, but if we do (e.g., changing therepository to "compile" dynamic schema to BDB instead of using an"interpreted" approach), it needs to be in the context of anapplication layer that is also designed for performance.

As you have quite correctly pointed out, the repository's performanceissues are dwarfed at the moment by other issues, so fixing the"interpreted database" aspect is a micro-optimization by comparison.

However, as the bigger-picture issues are fixed, these lower-levelfactors will eventually become a new bottleneck or "hot spot",performance-wise.

This is why I implemented skiplists. Python ? I re-wrote theskiplist implementation in C years ago.

So, we reinvented that wheel twice then.... and a skiplist still mustuse more memory per item than a Python list does or a sliding windowover a BTree would.

I'm not sure I see the point to this increasingly-detailed drilldown, though, since even if the issues regarding skiplists vs. BTreesand lists were completely resolved, it wouldn't affect any of theother points under discussion.


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Re: [Chandler-dev] [Sum] The Great Architecture Discussion of 2007

Reply via email to