On Apr 17, 2008, at 12:20 PM, Mark Waser wrote:
It has always been posssible to tweak any of the databases to the
other's transactional model.
Eh? Choices in concurrency control and scheduling run very deep in a
database engine, with ramifications that cascade through every other
part of the system. Equivalent transaction isolation levels can
behave very different in practice depending on the internal
transaction representation and management model. You cannot turn off
these side-effects, and you cannot "tweak" a non-MVCC-ish model to
behave like an MVCC-ish model at runtime in any way that matters.
Second of all, it was not a weakness -- it was a deliberate choice
of optimization -- it was a choice of OLAP over OLTP (and, let's be
honest, for most databases on limited memory machines with low OLTP
requirements, this was the correct choice until ballooning memories
made the reverse true).
The rise of the Internet, with its massive OLTP load characteristic,
kind of settled the issue. It is true though that Oracle-like OLTP
monsters have significantly higher resource overhead for storing the
same set of records. These days it is concurrency bottlenecks that
will kill you.
So, is your claim that Oracle distributes better than Microsoft? If
so, why?
Very mature implementation of the concepts, and almost every
conceivable mechanism and model for doing it is hidden under the
hood. Remember, they started introducing the relevant concepts ages
ago in Oracle 7, though in practice it was mostly unusable until
relatively recently. Consequently, their implementation is easily
the most general in that it works moderately well across the broadest
number of use cases because they've been tweaking that aspect for
years. Other commercial implementations tend to only work for a much
narrower set of use cases. In short, Oracle has a long head start.
There are new transactional architectures in academia that should
work better in a modern distributed environment than any of the
current commercial adaptations of classical architectures to
distributed environments.
And PostgreSQL will probably implement them long before Oracle or MS.
Ironically, a specific design decision that has created a fair amount
of argument for years makes PostgreSQL the engine starting from the
closest design point. PostgreSQL does not support threading and only
uses a single process per query execution, originally for portability
and data safety reasons -- the extreme hackability would be difficult
to do otherwise. This made certain types of trivial parallelism for
OLAP difficult. On the other hand, it has had distributed lock
functionality for a number of versions now.
If you look at newer models explicitly designed to make transactional
database scale better across distributed systems, you find that they
are built on a design requirement of single processes per resource,
strict access serialization, no local parallelism, and distributed
locks. Which is not that far removed from where PostgreSQL is today,
if you remove massive local concurrency support and its high overhead.
There are a number of outfits (see www.greenplum.com for a very
advanced implementation) that have hacked PostgreSQL to scale across
very large clusters for OLAP by essentially making the necessary
tweaks to approximate these types of models. The next step would be
to rip out a lot of expensive bits based on classical design
assumptions that make distributed write loads scale poorly.
In a sense, a design choice that has traditionally put some limits on
scaling PostgreSQL for OLAP put it in exactly the right place to make
implementation of next-generation architectures as natural of an
evolution as can be expected in this case.
J. Andrew Rogers
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=101455710-f059c4
Powered by Listbox: http://www.listbox.com