Good catch! it's the major drawback of JE that it is lack of on disk locality, if the internal node cannot put in RAM. BDB can provide such on disk locality. Seems like BDB JE can be an alternative of memtable + tablet log, BDB can be an alternative of SSTable.
This is just general discussion. I don't believe an open source project will bet it future on a commercial product with another license. :-) best regards, hanzhu On Tue, Mar 30, 2010 at 6:45 PM, Peter Schuller <peter.schul...@infidyne.com > wrote: > > Log structural database has the append-only characteristics, e.g. > BDB-JE. > > Is it an alternative for SSTable? Those matured database product might > have > > done a lot for cache management. Not sure whether it can improve the > > performance of read or not. > > BDB JE seems to be targetted mostly at cases where data fits in RAM, > or reasonably close to it. A problem is that while writes will be > append-only as long as the database is sufficiently small, you start > taking reads once the internal btree nodes no longer fit in RAM. So > depending on cache size, at a certain number of keys (thus size of the > btree) you start being seek-bound on reads while writing, even though > the writes are in and of themselves append-only and not subject to > seek overhead. > > Another effect, which I have not specifically confirmed in testing but > expect to happen, is that once you reach the point this point of > taking reads, compaction is probably going to be a lot more expensive. > While normally JE can pick a log segment with the most garbage and > mostly stream through it, re-writing non-garbage, that process will > then also become entirely seek bound if a only a small subset of the > btree fits in RAM. So now you have a seek bound compaction process > that must keep up with the append-only write process, meaning that > your append-only writes are limited by said seeks in addition to any > seeks it takes "directly" when generating the writes. > > Also keep in mind that JE won't have on-disk locality for neither > internal nodes nor leaf (data) nodes. > > The guaranteed append-only nature of Cassandra, in combination with > the on-disk locality, is one reason to prefer it, under some > circumstances, over JE even for non-clustered local use on a single > machine. > > (As a parenthesis: I doubt JE is being used very much with huge > databases, since a very significant CPU bottleneck became O(n) (with > respect to the number of log segments) file listings. This is probably > easily patched, or configured away by using larger log segments, but > the repeated O(n) file listings suggest to me that huge databases is > not an expected use case - beyond some hints in the documentation that > would indicate it's meant for smaller databases.) > > -- > / Peter Schuller >