At 08:11 AM 10/10/2007 -0700, Andi Vajda wrote:

On Wed, 10 Oct 2007, Phillip J. Eby wrote:

I mean that by implementing a skiplist *inside* of BerkeleyDB rather than using a native BerkeleyDB structure, we're adding an "interpretation" layer there.
What is the native Berkeley DB structure that corresponds to a skiplist ?

A BTree. And the native Python equivalent is a list. You may recall that at one time, I compared the performance of the skiplist implementation with a simple Python list object, and found that for lists of up to 50,000 items, Python lists were faster -- often significantly faster. This is another good example of where we are reinventing wheels in Python code that somebody already wrote a mature implementation of in C.

I don't see how a Berkeley DB B-Tree gives me positional access. Nor do I see how I can sparsely load and persist a Python list.

Berkeley lets you access portions of a record, does it not? You can read portions of a BTree into a positional list or an emulation of one, can you not? As far as I am aware, there is nothing in our requirements that requires truly *random* positional access within an index.

We use positional access for displaying lists of things, and such display is essentially an indexed sequential access, with positional access being needed within a particular window of offsets. In other words, we don't randomly choose to access item #1 now, and then #1000 -- we have strong locality or clustering of access. A simple sliding-window cache over a BTree could perhaps be sufficient.

This is a good example of the sort of thing that happens when an architecture is created before the practical requirements are well-understood.

To be clear: I am not saying you made a bad choice. I am saying the requirements that were given to you at the time were either incomplete or misleading.

I am also not proposing that we necessarily go back and revisit this choice in the near future, but if we do (e.g., changing the repository to "compile" dynamic schema to BDB instead of using an "interpreted" approach), it needs to be in the context of an application layer that is also designed for performance.

As you have quite correctly pointed out, the repository's performance issues are dwarfed at the moment by other issues, so fixing the "interpreted database" aspect is a micro-optimization by comparison.

However, as the bigger-picture issues are fixed, these lower-level factors will eventually become a new bottleneck or "hot spot", performance-wise.


This is why I implemented skiplists. Python ? I re-wrote the skiplist implementation in C years ago.

So, we reinvented that wheel twice then.... and a skiplist still must use more memory per item than a Python list does or a sliding window over a BTree would.

I'm not sure I see the point to this increasingly-detailed drill down, though, since even if the issues regarding skiplists vs. BTrees and lists were completely resolved, it wouldn't affect any of the other points under discussion.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Reply via email to