All,

In v4 Beta, replication is fully driven by the configuration file. In particular, tables to be replicated are (optionally) defined using two regexp-based options: include_filter and exclude_filter. This was the easiest solution that doesn't require ODS changes and this matches the trace/audit configuration thus being famous to Firebird DBAs.

However, I don't think this is flexible enough. IMO, there's much sense in separating "what is replicated" from "how it's replicated". The former is a part of the CDC (custom data capture) interface and defines what changes we need to collect. The latter is a set of implementation details belonging to either the built-in replicator engine or 3rd party CDC plugin - caching rules, transport options, etc. Such a separation would allow to build a really flexible architecture.

Moving this idea further, it's worth making the "what is replicated" part controlled declaratively, using SQL. I.e. DBA defines some "replication set" by including or excluding tables, this set is stored inside the database and used by the CDC publisher to filter out unnecessary changes before passing them to the CDC handler. These include/exclude rules are also replicated.

I see the following (at least) benefits in this approach:

1) Allow custom CDC solutions without interacting with the built-in replication configuration

2) Automatic setup for cascaded replication - every replica knows what tables are allowed for replication and reuse these rules without any explicit include/exclude settings

3) We may allow modification of non-replicated tables in read-only replicas - this will not cause replication conflicts

4) With some additional efforts, we could allow the replication set to be changed at runtime, without restarting the server (it could be bad practice in general, but perhaps useful to quickly fix some configuration mistake)

Dimitry Sibiryakov has kindly provided a pull request implementing this feature using the CREATE|ALTER TABLE extensions. It uses RDB$FLAGS for storage and thus doesn't require ODS changes. I suppose this can be accepted as a straightforward solution for v4. However, it may be somewhat limited in the long term. So I'd like to have it discussed before accepting the PR.

One thing I'm worried about is whether it's enough to have a single global replication set or maybe it's useful to have many independent replication sets. How they can be used, for example:

1) Two slightly different global replications sets are defined, only one of them is active at a time, but we can switch between them (e.g. via enable/disable commands)

2) Different tables (separated by some rule) are included into different replication sets which are all active together, their intersection is used by the CDC publisher. This may be useful if these replication sets has some declarative customizations (see below).

3) Different replication sets are declared as intended for different CDC plugins. This implies that multiple CDC plugins may be configured independently. In this cases the CDC publisher checks the source (table) against the target (plugin) before sending the changes.

These cases are purely theoretical, but I believe we should consider them and decide whether it's worth to be prepared for them or not.

Second, IMHO declaring tables as "publishable" via CREATE|ALTER TABLE is too restrictive. I'd rather manage the replication set using some global commands, be it ALTER DATABASE or something different, allowing to include/exclude all tables at once, or comma-separated list of tables, or maybe tables by mask (regexp?). Of course, both SQL solutions (database level and table level) may co-exist.

Finally, if we consider the replication set being a filter, it may be also useful to limit the published change set to some particular operations (INSERT|UPDATE|DELETE) or even some particular rows (WHERE filter). I doubt this is useful for replication per se, but this may allow something similar to "change views" in InterBase, currently with a CDC plugin acting as a client, but perhaps it could be extended later to interact with the real client application.

And one partially related question from another angle: does it make sense to implement also replica-side declarative filtering? I mean the case where changes for all tables are journaled but for some reason only some tables should be applied to replica - e.g. two independent replicas with different filters but replicated from the same master journal (to avoid double journaling). If this feature is desirable, then how should the master-side filter (replication set) co-exist with the replica-side filter?

Please provide your feedback on these questions. I'm not talking about implementing everything in FB4, I just need to understand how to build the foundation that could be extended later with minimal efforts.


Dmitry


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to