To elaborate somewhat, a very small number of changes would need to be made
to the Firebird engine -- the essential architecture is already in place.
The changes are mostly to DPM and below. Ironically, the code to generate
page change records used to be part of the code base to support long term
journalling. Some of the code may exist.
The major new piece is a separate component, the page server. By phase 5,
it will be necessary to integrate page change and lock manager traffic, so
perhaps a better name would be page/lock server. Code historians may find
existing references to an ancient page/lock server, circa 1987, which I
never competed.
At the end of phase 5, Firebird would run exactly as it does now except:
1. Pages would be fetched from the page/lock server rather than disk.
2. Page change records would be queued for delivery to the page/lock server
3. The messaage queue to the page/lock server would be flushed when full
or when a page with page changes is released
4. The engine would post page changes from other instances to pages in
cache with an existence lock.
The really nifty part of the architecture is that pages themselves are
never written by an engine. The page/lock server applies page changes,
writing its updated pages to disk at its convenience. In the meantime,
each engine is applying page changes to pages in cache (locked with
Existence), so an Engine can acquire a Shared or Exclusive lock to page by
requesting (and receiving) a a lock on the page. This works because:
1. Page change records will always be transmitted before that page is
unlocked by an engine.
2. If another engine requests a lock on that page, it will always receive
any pending page change records before the lock grant record.
The net volume of network traffic will be vastly less than current Firebird
because a) only page changes are transmitted, and b) change page records
are blocked into Ethernet packet size blocks (when possible). Between that
and the fact that a GB Ethernet round trip takes about 100 microseconds and
a disk transfer about 6 milliseconds (on a good and lucky day). So many
fewer I/Os and much faster I/Os.
The tricky parts all revolve around the issues of multiple/page lock
servers and have almost no effect on the Firebird engine, per se.
Questions? Brickbats? Rebuttals?
On Sunday, March 22, 2015, James Starkey <[email protected]> wrote:
> Goal: Morph Firebird into a high performance, distributed, fully
> redundant database.
>
> Both JRDs (DEC Rdb/ELN and Interbase) were designed as elastic database
> systems for VAXClusters. They are share-nothing engines, communication
> exclusively through a distributed lock manage and network attach disks.
> The flaw in this architecture, as Oracle learned with early versions of RAC
> is that to transfer a disk page between server processors required a disk
> write followed by a disk read, resulting in an unacceptable latency.
>
> The phases listed below are conceptual. Implementation could combine
> phases or implement them out of order.
>
> Phase 0 (Hybrid classic / superserver)
>
> Classic synchronizes with lock manager control page locks allowing
> multiple processes to share the database disk file. Superserver uses
> lightweight latches to synchronize page access between threads. A hybrid,
> implemented in Vulcan, supported da hybrid engine in which both mechanisms
> were active. This is the engine from which subsequent phases evolves.
>
> Phase 1 (Page server)
>
> A network accessible page server process replaces the physical I/O layer
> (PIO) in the hybrid engine. The page server maintains its own write back
> cache, so a dirty page transfer between two engines requires only network
> transfers, leaving the dirty page queued in the page server for eventual
> write to disk.
>
> Phase 2 (Distributed lock manager)
>
> The Firebird lock manager is replaced by a distributed lock manaager (home
> brew or existing). Interbase originally used the VMS distributed lock
> manager on VMS. All other Interbase lock managers were built to emulate
> VMS distributed lock manage semantics (not a big deal as the DLM was
> designed by Steve Beckhardt specifically for DBMS systems).
>
> Phase 3 (Page delta logging)
>
> Up to this point, page writes require that full dirty pages be sent to
> page server to be written. In this phase, the engine sends page changes
> (deltas) to the page server to applied to database pages. This reduces the
> network traffic for a TIP change, for example, from 4K bytes to about a
> dozen bytes. It also allows batching of many page change records into a
> single physical block for transfer to the page server. The page server
> applied the page change records to pages in its cache before writing the
> page or sending a page image to another engine process. For
> restartabllity, it may be useful to post page changes records to a
> non-buffered serial log prior to applying commit records.
>
> Phase 4 (Peer to peer distribution)
>
> Another lock type, Existence, is defined between None and Shared, used to
> indicate that a process has an otherwise unlocked copy of a page in its
> cache. When an engine modified a page (with an Exclusive lock, of course),
> page change messages are sent to all processes with Existence locks on that
> page, enabling other engines to maintain up to date versions of pages not
> actively in use. If an engine requires access to such a page, it requires
> only page lock, skipping the page transfer. This does break the layering
> between the engine and distributed lock manager, requiring a custom or
> semi-custom lock manager.
>
> Phase 5 (Redundant page servers)
>
> With Phase 4 in place, additional page servers can be be added where each
> page server has automatic Existence locks on all pages. If necessary, a
> page server may need to read a page before it can apply a page change
> record. The tricky part of synchronizing a page server entering (or
> re-entering!) a Firebird cluster, in which case it needs to request page
> images from another page server prior to applying page change records.
>
> At this point, Firebird is an elasticallly scalable, fully redundant
> database systems running on commodity servers without the need to exotic
> hardware. Multi-client servers can be started on as many machines has
> necessary to meet performance and availability requirements. Redundant
> page servers can be added to provide arbitrary levels of durability. The
> gating limits of performance are network bandwidth and disk bandwidth on
> page servers to eventually write disk pages (unlike legacy Firebird,
> however, many page changes could be applied a physical page before it
> eventually gets written to disk).
>
> And, happiy, this extended architecture is fully compatible with a single
> process multi-client server running with latches and local disk without a
> running page server, so it has no impact on the low end and entry level
> instances.
>
> I leave it to the project on whether or not it wants to pursue this
> architecture. If so, I'm available as an advisor. The design is intended
> to allow implementation in phases, so the system is never broken. All that
> is necessarily is will and flexibility of mind.
>
>
> --
> Jim Starkey
>
--
Jim Starkey
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel