On Apr 17, 2008, at 12:20 PM, Mark Waser wrote:
It has always been posssible to tweak any of the databases to the other's transactional model.


Eh? Choices in concurrency control and scheduling run very deep in a database engine, with ramifications that cascade through every other part of the system. Equivalent transaction isolation levels can behave very different in practice depending on the internal transaction representation and management model. You cannot turn off these side-effects, and you cannot "tweak" a non-MVCC-ish model to behave like an MVCC-ish model at runtime in any way that matters.


Second of all, it was not a weakness -- it was a deliberate choice of optimization -- it was a choice of OLAP over OLTP (and, let's be honest, for most databases on limited memory machines with low OLTP requirements, this was the correct choice until ballooning memories made the reverse true).


The rise of the Internet, with its massive OLTP load characteristic, kind of settled the issue. It is true though that Oracle-like OLTP monsters have significantly higher resource overhead for storing the same set of records. These days it is concurrency bottlenecks that will kill you.


So, is your claim that Oracle distributes better than Microsoft? If so, why?


Very mature implementation of the concepts, and almost every conceivable mechanism and model for doing it is hidden under the hood. Remember, they started introducing the relevant concepts ages ago in Oracle 7, though in practice it was mostly unusable until relatively recently. Consequently, their implementation is easily the most general in that it works moderately well across the broadest number of use cases because they've been tweaking that aspect for years. Other commercial implementations tend to only work for a much narrower set of use cases. In short, Oracle has a long head start.


There are new transactional architectures in academia that should work better in a modern distributed environment than any of the current commercial adaptations of classical architectures to distributed environments.

And PostgreSQL will probably implement them long before Oracle or MS.


Ironically, a specific design decision that has created a fair amount of argument for years makes PostgreSQL the engine starting from the closest design point. PostgreSQL does not support threading and only uses a single process per query execution, originally for portability and data safety reasons -- the extreme hackability would be difficult to do otherwise. This made certain types of trivial parallelism for OLAP difficult. On the other hand, it has had distributed lock functionality for a number of versions now.

If you look at newer models explicitly designed to make transactional database scale better across distributed systems, you find that they are built on a design requirement of single processes per resource, strict access serialization, no local parallelism, and distributed locks. Which is not that far removed from where PostgreSQL is today, if you remove massive local concurrency support and its high overhead. There are a number of outfits (see www.greenplum.com for a very advanced implementation) that have hacked PostgreSQL to scale across very large clusters for OLAP by essentially making the necessary tweaks to approximate these types of models. The next step would be to rip out a lot of expensive bits based on classical design assumptions that make distributed write loads scale poorly.

In a sense, a design choice that has traditionally put some limits on scaling PostgreSQL for OLAP put it in exactly the right place to make implementation of next-generation architectures as natural of an evolution as can be expected in this case.


J. Andrew Rogers

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com

Reply via email to