Hi all,

I recently stumbled across this article:
http://queue.acm.org/detail.cfm?id=1814327

Basically, the author is arguing that often, the effect of virtual memory is 
ignored, at great cost. (it's somewhat abrasive but it gets the point across). 
The comments were very helpful in showing that if you want data structures that 
behave well on multi-tiered memory, you need to use "Cache Oblivious" 
algorithms.

I only have a vague idea of the in-memory and on-disk data structures that 
CouchDB has (I know that they're B-trees), so I'm here to ask if CouchDB's 
algorithms have been designed with paging penalties in mind.

One interesting paper popped up in a search for cache oblivious B-trees:

http://www.cs.sunysb.edu/~bender/pub/PODS06-BFK.pdf
===============
ABSTRACT
B-trees are the data structure of choice for maintaining searchable data on 
disk. However, B-trees perform suboptimally
• when keys are long or of variable length,
• when keys are compressed, even when using front compression, the standard 
B-tree compression scheme,
• for range queries, and
• with respect to memory effects such as disk prefetching.

This paper presents a cache-oblivious string B-tree (COSB-tree) data structure 
that is efficient in all these ways:
• The COSB-tree searches asymptotically optimally and inserts and deletes 
nearly optimally.
• It maintains an index whose size is proportional to the front-compressed size 
of the dictionary. Furthermore, unlike standard front-compressed strings, keys 
can be decompressed in a memory-efficient manner.
• It performs range queries with no extra disk seeks; in contrast, B-trees 
incur disk seeks when skipping from leaf block to leaf block.
• It utilizes all levels of a memory hierarchy efficiently and makes good use 
of disk locality by using cache-oblivious layout strategies.
===============

Is this what CouchDB implements? If not, would implementing this be considered?

Thanks,

Wout.

Reply via email to