All,
In v4 Beta, replication is fully driven by the configuration file. In
particular, tables to be replicated are (optionally) defined using two
regexp-based options: include_filter and exclude_filter. This was the
easiest solution that doesn't require ODS changes and this matches the
trace/audit configuration thus being famous to Firebird DBAs.
However, I don't think this is flexible enough. IMO, there's much sense
in separating "what is replicated" from "how it's replicated". The
former is a part of the CDC (custom data capture) interface and defines
what changes we need to collect. The latter is a set of implementation
details belonging to either the built-in replicator engine or 3rd party
CDC plugin - caching rules, transport options, etc. Such a separation
would allow to build a really flexible architecture.
Moving this idea further, it's worth making the "what is replicated"
part controlled declaratively, using SQL. I.e. DBA defines some
"replication set" by including or excluding tables, this set is stored
inside the database and used by the CDC publisher to filter out
unnecessary changes before passing them to the CDC handler. These
include/exclude rules are also replicated.
I see the following (at least) benefits in this approach:
1) Allow custom CDC solutions without interacting with the built-in
replication configuration
2) Automatic setup for cascaded replication - every replica knows what
tables are allowed for replication and reuse these rules without any
explicit include/exclude settings
3) We may allow modification of non-replicated tables in read-only
replicas - this will not cause replication conflicts
4) With some additional efforts, we could allow the replication set to
be changed at runtime, without restarting the server (it could be bad
practice in general, but perhaps useful to quickly fix some
configuration mistake)
Dimitry Sibiryakov has kindly provided a pull request implementing this
feature using the CREATE|ALTER TABLE extensions. It uses RDB$FLAGS for
storage and thus doesn't require ODS changes. I suppose this can be
accepted as a straightforward solution for v4. However, it may be
somewhat limited in the long term. So I'd like to have it discussed
before accepting the PR.
One thing I'm worried about is whether it's enough to have a single
global replication set or maybe it's useful to have many independent
replication sets. How they can be used, for example:
1) Two slightly different global replications sets are defined, only one
of them is active at a time, but we can switch between them (e.g. via
enable/disable commands)
2) Different tables (separated by some rule) are included into different
replication sets which are all active together, their intersection is
used by the CDC publisher. This may be useful if these replication sets
has some declarative customizations (see below).
3) Different replication sets are declared as intended for different CDC
plugins. This implies that multiple CDC plugins may be configured
independently. In this cases the CDC publisher checks the source (table)
against the target (plugin) before sending the changes.
These cases are purely theoretical, but I believe we should consider
them and decide whether it's worth to be prepared for them or not.
Second, IMHO declaring tables as "publishable" via CREATE|ALTER TABLE is
too restrictive. I'd rather manage the replication set using some global
commands, be it ALTER DATABASE or something different, allowing to
include/exclude all tables at once, or comma-separated list of tables,
or maybe tables by mask (regexp?). Of course, both SQL solutions
(database level and table level) may co-exist.
Finally, if we consider the replication set being a filter, it may be
also useful to limit the published change set to some particular
operations (INSERT|UPDATE|DELETE) or even some particular rows (WHERE
filter). I doubt this is useful for replication per se, but this may
allow something similar to "change views" in InterBase, currently with a
CDC plugin acting as a client, but perhaps it could be extended later to
interact with the real client application.
And one partially related question from another angle: does it make
sense to implement also replica-side declarative filtering? I mean the
case where changes for all tables are journaled but for some reason only
some tables should be applied to replica - e.g. two independent replicas
with different filters but replicated from the same master journal (to
avoid double journaling). If this feature is desirable, then how should
the master-side filter (replication set) co-exist with the replica-side
filter?
Please provide your feedback on these questions. I'm not talking about
implementing everything in FB4, I just need to understand how to build
the foundation that could be extended later with minimal efforts.
Dmitry
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel