Re: Data Load

Matt Burgess Tue, 02 May 2017 08:01:04 -0700

Anil,

Is this a "one-time" migration, meaning you would take the on-prem
tables and put them on the cloud DB just once? Or would this be an
incremental operation, where you do the initial move and then take any
"new" rows from the source and apply them to the target?  For the
latter, there are a couple of options:

1) Rebuild the cloud DB periodically. You can use ExecuteSQL ->
[processors] -> PutSQL after perhaps deleting your target
DB/tables/etc.  This could be time-consuming and expensive. The
processors in-between probably include ConvertAvroToJSON and
ConvertJSONToSQL.
2) Use QueryDatabaseTable or (GenerateTableFetch -> ExecuteSQL) to get
the source data. For this your table would need a column whose values
always increase, that column would comprise the value of the "Maximum
Value Column" property in the aforementioned processors' configuration
dialogs. You would need one QueryDatabaseTable or GenerateTableFetch
for each table in your DB.

In addition to these current solutions, as of the upcoming NiFi 1.2.0
release, you have the following options:
3) If the source database is MySQL, you can use the CaptureChangeMySQL
processor to get binary log events flowing through various processors
into PutDatabaseRecord to place them at the source. This pattern is
true Change Data Capture (CDC) versus the other two options above.
4) Option #2 will be improved such that GenerateTableFetch will accept
incoming flow files, so you can use ListDatabaseTables ->
GenerateTableFetch -> ExecuteSQL to handle multiple tables with one
flow.

If this is a one-time migration, a data flow tool might not be the
best choice, you could consider something like Flyway [1] instead.

Regards,
Matt

[1] https://flywaydb.org/documentation/command/migrate

On Tue, May 2, 2017 at 10:41 AM, Anil Rai <[email protected]> wrote:
> I have a simple use case.
>
> DB (On Premise) and DB (On Cloud).
>
> I want to use nifi to extract data from on prem DB (huge volumes) and
> insert into the same table structure that is hosted on cloud.
>
> I could use ExecuteSQL on both sides of the fence (to extract from on prem
> and insert onto cloud). What processors are needed in between (if at all)?
> As i am not doing any transformations at all....it is just extract and load
> use case

Re: Data Load

Reply via email to