[ 
https://issues.apache.org/jira/browse/NIFI-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-3413:
-------------------------------
    Description: 
Database systems such as MySQL, Oracle, and SQL Server allow access to their 
transactional logs and such, in order for external clients to have a "change 
data capture" (CDC) capability. As an initial effort, I propose a 
CaptureChangeMySQL processor to enable this in NiFi. This would incorporate any 
APIs necessary for follow-on Jira cases to implement CDC processors for 
databases such as Oracle, SQL Server, PostgreSQL, etc.

The processor would include properties needed for database connectivity (unless 
using a DBCPConnectionPool would suffice), as well as any to configure 
third-party clients (mysql-binlog-connector, e.g.). It would also need to keep 
a "sequence ID" such that an EnforceOrder processor (NIFI-3414) for example 
could guarantee the order of CDC events for use cases such as replication. It 
will likely need State Management for that, and may need other facilities such 
as a DistributedMapCache in order to keep information (column names and types, 
e.g.) that enrich the raw CDC events.

The processor would accept no incoming connections (it is a "get" or source 
processor), would be intended to run on the primary node only as a single 
threaded processor, and would generate a flow file for each operation (INSERT, 
UPDATE, DELETE, e,g,) in one or some number of formats (JSON, e.g.).

  was:
Database systems such as MySQL, Oracle, and SQL Server allow access to their 
transactional logs and such, in order for external clients to have a "change 
data capture" (CDC) capability. I propose a GetChangeDataCapture processor to 
enable this in NiFi.

The processor would be configured with a DBCPConnectionPool controller service, 
as well as a Database Type property (similar to the one in QueryDatabaseTable) 
for database-specific handling. Additional properties might include the CDC 
table name, etc.  Additional database-specific properties could be handled 
using dynamic properties (and the documentation should reflect this).

The processor would accept no incoming connections (it is a "Get" or source 
processor), would be intended to run on the primary node only as a single 
threaded processor, and would generate a flow file for each operation (INSERT, 
UPDATE, DELETE, e,g,) in one or some number of formats (JSON, e.g.). The flow 
files would be transferred in time order (to enable a replication solution, for 
example), perhaps with some auto-incrementing attribute to also indicate order 
if need be.



> Implement a CaptureChangeMySQL processor
> ----------------------------------------
>
>                 Key: NIFI-3413
>                 URL: https://issues.apache.org/jira/browse/NIFI-3413
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>             Fix For: 1.2.0
>
>
> Database systems such as MySQL, Oracle, and SQL Server allow access to their 
> transactional logs and such, in order for external clients to have a "change 
> data capture" (CDC) capability. As an initial effort, I propose a 
> CaptureChangeMySQL processor to enable this in NiFi. This would incorporate 
> any APIs necessary for follow-on Jira cases to implement CDC processors for 
> databases such as Oracle, SQL Server, PostgreSQL, etc.
> The processor would include properties needed for database connectivity 
> (unless using a DBCPConnectionPool would suffice), as well as any to 
> configure third-party clients (mysql-binlog-connector, e.g.). It would also 
> need to keep a "sequence ID" such that an EnforceOrder processor (NIFI-3414) 
> for example could guarantee the order of CDC events for use cases such as 
> replication. It will likely need State Management for that, and may need 
> other facilities such as a DistributedMapCache in order to keep information 
> (column names and types, e.g.) that enrich the raw CDC events.
> The processor would accept no incoming connections (it is a "get" or source 
> processor), would be intended to run on the primary node only as a single 
> threaded processor, and would generate a flow file for each operation 
> (INSERT, UPDATE, DELETE, e,g,) in one or some number of formats (JSON, e.g.).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to