On Apr 16, 2008, at 9:51 PM, YKY (Yan King Yin) wrote:
Typically we need to retrieve many nodes from the DB to do inference.
The nodes may be scattered around the DB.  So it may require *many*
disk accesses.  My impression is that most DBMS are optimized for
complex queries but not for large numbers of simple retrievals -- am I
correct about this?


No, you are not correct about this. All good database engines use a combination of clever adaptive cache replacement algorithms (read: keeps stuff you are most likely to access next in RAM) and cost-based optimization (read: optimizes performance by adaptively selecting query execution algorithms based on measured resource access costs) to optimize performance across a broad range of use cases. For highly regular access patterns (read: similar query types and complexity), the engine will converge on very efficient access patterns and resource management that match this usage. For irregular access patterns, it will attempt to dynamically select the best options given recent access history and resource cost statistics -- not always the best result (on occasion hand optimization could do better), but more likely to produce good results than simpler rule-based optimization on average.

Note that by "good database engine" I am talking engines that actually support these kinds of tightly integrated and adaptive management features: Oracle, DB2, PostgreSQL, et al. This does *not* include MySQL, which is a naive and relatively non-adaptive engine, and which scales much worse and is generally slower than PostgreSQL anyway if you are looking for a free open source solution.


I would also point out that different engines are optimized for different use cases. For example, while Oracle and PostgreSQL share the same transaction model, Oracle design decisions optimized for massive numbers of small concurrent update transactions and PostgreSQL design decisions optimized for massive numbers of small concurrent insert/delete transaction. Databases based on other transaction models, such as IBM's DB2, sacrifice extreme write concurrency for superior read-only performance. There are unavoidable tradeoffs with such things, so the market has a diverse ecology of engines that have chosen a different set of tradeoffs and buyers should be aware of what these tradeoffs are if scalable performance is a criteria.


J. Andrew Rogers

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Reply via email to