[jira] [Commented] (NIFI-3413) Implement a GetChangeDataCapture processor

Matt Burgess (JIRA) Thu, 23 Mar 2017 16:16:59 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939409#comment-15939409
 ]


Matt Burgess commented on NIFI-3413:
------------------------------------

Notes on testing:

To enable binlog on my MySQL instance, I added the following to the [mysqld] 
section of my.cnf:

server-id=1
log-bin=master
binlog_format=row

this sets the server ID to 1, the prefix for binlog files to "master", and 
enables row-level binlog events.

I wrote a Groovy script to dump binlog events to the console, it is a Gist 
here: https://gist.github.com/mattyb149/61ea035e5e917e65fd05c74bec0d090b

Also I have a test template that takes the output of GetChangeDataCaptureMySQL 
and translates the JSON events into SQL that can be executed on the target 
system. To round out the template, the EnforceOrder processor is needed to 
effectively "sort" the events after having been processed.  The original 
template is here: 
https://gist.github.com/mattyb149/5694b1c593adb56b40a84f92964ec9b7


> Implement a GetChangeDataCapture processor
> ------------------------------------------
>
>                 Key: NIFI-3413
>                 URL: https://issues.apache.org/jira/browse/NIFI-3413
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>
> Database systems such as MySQL, Oracle, and SQL Server allow access to their 
> transactional logs and such, in order for external clients to have a "change 
> data capture" (CDC) capability. I propose a GetChangeDataCapture processor to 
> enable this in NiFi.
> The processor would be configured with a DBCPConnectionPool controller 
> service, as well as a Database Type property (similar to the one in 
> QueryDatabaseTable) for database-specific handling. Additional properties 
> might include the CDC table name, etc.  Additional database-specific 
> properties could be handled using dynamic properties (and the documentation 
> should reflect this).
> The processor would accept no incoming connections (it is a "Get" or source 
> processor), would be intended to run on the primary node only as a single 
> threaded processor, and would generate a flow file for each operation 
> (INSERT, UPDATE, DELETE, e,g,) in one or some number of formats (JSON, e.g.). 
> The flow files would be transferred in time order (to enable a replication 
> solution, for example), perhaps with some auto-incrementing attribute to also 
> indicate order if need be.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (NIFI-3413) Implement a GetChangeDataCapture processor

Reply via email to