[
https://issues.apache.org/jira/browse/NIFI-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939409#comment-15939409
]
Matt Burgess commented on NIFI-3413:
------------------------------------
Notes on testing:
To enable binlog on my MySQL instance, I added the following to the [mysqld]
section of my.cnf:
server-id=1
log-bin=master
binlog_format=row
this sets the server ID to 1, the prefix for binlog files to "master", and
enables row-level binlog events.
I wrote a Groovy script to dump binlog events to the console, it is a Gist
here: https://gist.github.com/mattyb149/61ea035e5e917e65fd05c74bec0d090b
Also I have a test template that takes the output of GetChangeDataCaptureMySQL
and translates the JSON events into SQL that can be executed on the target
system. To round out the template, the EnforceOrder processor is needed to
effectively "sort" the events after having been processed. The original
template is here:
https://gist.github.com/mattyb149/5694b1c593adb56b40a84f92964ec9b7
> Implement a GetChangeDataCapture processor
> ------------------------------------------
>
> Key: NIFI-3413
> URL: https://issues.apache.org/jira/browse/NIFI-3413
> Project: Apache NiFi
> Issue Type: New Feature
> Components: Extensions
> Reporter: Matt Burgess
> Assignee: Matt Burgess
>
> Database systems such as MySQL, Oracle, and SQL Server allow access to their
> transactional logs and such, in order for external clients to have a "change
> data capture" (CDC) capability. I propose a GetChangeDataCapture processor to
> enable this in NiFi.
> The processor would be configured with a DBCPConnectionPool controller
> service, as well as a Database Type property (similar to the one in
> QueryDatabaseTable) for database-specific handling. Additional properties
> might include the CDC table name, etc. Additional database-specific
> properties could be handled using dynamic properties (and the documentation
> should reflect this).
> The processor would accept no incoming connections (it is a "Get" or source
> processor), would be intended to run on the primary node only as a single
> threaded processor, and would generate a flow file for each operation
> (INSERT, UPDATE, DELETE, e,g,) in one or some number of formats (JSON, e.g.).
> The flow files would be transferred in time order (to enable a replication
> solution, for example), perhaps with some auto-incrementing attribute to also
> indicate order if need be.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)