Goal:  Morph Firebird into a high performance, distributed, fully redundant
database.

Both JRDs (DEC Rdb/ELN and Interbase) were designed as elastic database
systems for VAXClusters.  They are share-nothing engines, communication
exclusively through a distributed lock manage and network attach disks.
The flaw in this architecture, as Oracle learned with early versions of RAC
is that to transfer a disk page between server processors required a disk
write followed by a disk read, resulting in an unacceptable latency.

The phases listed below are conceptual.  Implementation could combine
phases or implement them out of order.

Phase 0 (Hybrid classic / superserver)

Classic synchronizes with lock manager control page locks allowing multiple
processes to share the database disk file.  Superserver uses lightweight
latches to synchronize page access between threads.  A hybrid, implemented
in Vulcan, supported da hybrid engine in which both mechanisms were
active.  This is the engine from which subsequent phases evolves.

Phase 1 (Page server)

A network accessible page server process replaces the physical I/O layer
(PIO) in the hybrid engine.  The page server maintains its own write back
cache, so a dirty page transfer between two engines requires only network
transfers, leaving the dirty page queued in the page server for eventual
write to disk.

Phase 2 (Distributed lock manager)

The Firebird lock manager is replaced by a distributed lock manaager (home
brew or existing).  Interbase originally used the VMS distributed lock
manager on VMS.  All other Interbase lock managers were built to emulate
VMS distributed lock manage semantics (not a big deal as the DLM was
designed by Steve Beckhardt specifically for DBMS systems).

Phase 3 (Page delta logging)

Up to this point, page writes require that full dirty pages be sent to page
server to be written. In this phase, the engine sends page changes (deltas)
to the page server to applied to database pages.  This reduces the network
traffic for a TIP change, for example, from 4K bytes to about a dozen
bytes.  It also allows batching of many page change records into a single
physical block for transfer to the page server.  The page server applied
the page change records to pages in its cache before writing the page or
sending a page image to another engine process.  For restartabllity, it may
be useful to post page changes records to a non-buffered serial log prior
to applying commit records.

Phase 4 (Peer to peer distribution)

Another lock type, Existence, is defined between None and Shared, used to
indicate that a process has an otherwise unlocked copy of a page in its
cache.  When an engine modified a page (with an Exclusive lock, of course),
page change messages are sent to all processes with Existence locks on that
page, enabling other engines to maintain up to date versions of pages not
actively in use.  If an engine requires access to such a page, it requires
only page lock, skipping the page transfer.  This does break the layering
between the engine and distributed lock manager, requiring a custom or
semi-custom lock manager.

Phase 5 (Redundant page servers)

With Phase 4 in place, additional page servers can be be added where each
page server has automatic Existence locks on all pages.  If necessary, a
page server may need to read a page before it can apply a page change
record.  The tricky part of synchronizing a page server entering (or
re-entering!) a Firebird cluster, in which case it needs to request page
images from another page server prior to applying page change records.

At this point,  Firebird is an elasticallly scalable, fully redundant
database systems running on commodity servers without the need to exotic
hardware.  Multi-client servers can be started on as many machines has
necessary to meet performance and availability requirements.  Redundant
page servers can be added to provide arbitrary levels of durability.  The
gating limits of performance are network bandwidth and disk bandwidth on
page servers to eventually write disk pages (unlike legacy Firebird,
however, many page changes could be applied a physical page before it
eventually gets written to disk).

And, happiy, this extended architecture is fully compatible with a single
process multi-client server running with latches and local disk without a
running page server, so it has no impact on the low end and entry level
instances.

I leave it to the project on whether or not it wants to pursue this
architecture.  If so, I'm available as an advisor.  The design is intended
to allow implementation in phases, so the system is never broken.  All that
is necessarily is will and flexibility of mind.


-- 
Jim Starkey
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to