Re: [infinispan-dev] Infinispan and change data capture

Adrian Nistor Mon, 11 Jul 2016 01:43:47 -0700

Hi Randall,

Infinispan supports both push and pull access models. The push model issupported by events (and listeners), which are cluster wide and areavailable in both library and remote mode (hotrod). The notificationsystem is pretty advanced as there is a filtering mechanism availablethat can use a hand coded filter / converter or one specified in jpql(experimental atm). Getting a snapshot of the initial data is alsopossible. But infinispan does not produce a transaction log to be usedfor determining all changes that happened since a previous connectiontime, so you'll always have to get a new full snapshot when re-connecting.

So if Infinispan is the data store I would base the Debezium connectorimplementation on Infinispan's event notification system. Not sure aboutthe other use case though.


Adrian

On 07/09/2016 04:38 PM, Randall Hauch wrote:

The Debezium project [1] is working on building change data captureconnectors for a variety of databases. MySQL is available now, MongoDBwill be soon, and PostgreSQL and Oracle are next on our roadmap.
One way in which Debezium and Infinispan can be used together is whenInfinispan is being used as a cache for data stored in a database. Inthis case, Debezium can capture the changes to the database andproduce a stream of events; a separate process can consume thesechange and evict entries from an Infinispan cache.
If Infinispan is to be used as a data store, then it would be usefulfor Debezium to be able to capture those changes so otherapps/services can consume the changes. First of all, does this makesense? Secondly, if it does, then Debezium would need an Infinispanconnector, and it’s not clear to me how that connector might capturethe changes from Infinispan.
Debezium typically monitors the log of transactions/changes that arecommitted to a database. Of course how this works varies for each typeof database. For example, MySQL internally produces a transaction logthat contains information about every committed row change, and MySQLensures that every committed change is included and that non-committedchanges are excluded. The MySQL mechanism is actually part of thereplication mechanism, so slaves update their internal state byreading the master’s log. The Debezium MySQL connector [2] simplyreads the same log.
Infinispan has several mechanisms that may be useful:

  * Interceptors - See [3]. This seems pretty straightforward and IIUC
    provides access to all internal operations. However, it’s not
    clear to me whether a single interceptor will see all the changes
    in a cluster (perhaps in local and replicated modes) or only those
    changes that happen on that particular node (in distributed mode).
    It’s also not clear whether this interceptor is called within the
    context of the cache’s transaction, so if a failure happens just
    at the wrong time whether a change might be made to the cache but
    is not seen by the interceptor (or vice versa).
  * Cross-site replication - See [4][5]. A potential advantage of this
    mechanism appears to be that it is defined (more) globally, and it
    appears to function if the remote backup comes back online after
    being offline for a period of time.
  * State transfer - is it possible to participate as a non-active
    member of the cluster, and to effectively read all state transfer
    activities that occur within the cluster?
  * Cache store - tie into the cache store mechanism, perhaps by
    wrapping an existing cache store and sitting between the cache and
    the cache store
  * Monitor the cache store - don’t monitor Infinispan at all, and
    instead monitor the store in which Infinispan is storing entries.
    (This is probably the least attractive, since some stores can’t be
    monitored, or because the store is persisting an opaque binary value.)


Are there other mechanism that might be used?
There are a couple of important requirements for change data captureto be able to work correctly:
 1. Upon initial connection, the CDC connector must be able to obtain
    a snapshot of all existing data, followed by seeing all changes to
    data that may have occurred since the snapshot was started. If the
    connector is stopped/fails, upon restart it needs to be able to
    reconnect and either see all changes that occurred since it last
    was capturing changes, or perform a snapshot. (Performing a
    snapshot upon restart is very inefficient and undesirable.) This
    works as follows: the CDC connector only records the “offset” in
    the source’s sequence of events; what this “offset” entails
    depends on the source. Upon restart, the connector can use this
    offset information to coordinate with the source where it wants to
    start reading. (In MySQL and PostgreSQL, every event includes the
    filename of the log and position in that file. MongoDB includes in
    each event the monotonically increasing timestamp of the transaction.
 2. No change can be missed, even when things go wrong and components
    crash.
 3. When a new entry is added, the “after” state of the entity will be
    included. When an entry is updated, the “after” state will be
    included in the event; if possible, the event should also include
    the “before” state. When an entry is removed, the “before” state
    should be included in the event.


Any thoughts or advice would be greatly appreciated.

Best regards,

Randall


[1] http://debezium.io
[2] http://debezium.io/docs/connectors/mysql/
[3]http://infinispan.org/docs/stable/user_guide/user_guide.html#_custom_interceptors_chapter[4]http://infinispan.org/docs/stable/user_guide/user_guide.html#CrossSiteReplication[5]https://github.com/infinispan/infinispan/wiki/Design-For-Cross-Site-Replication
_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Infinispan and change data capture

Reply via email to