Yes that sounds like your best bet, assuming you have the "Maximum Value Column" present in the table you want to migrate. Then a flow might look like:
QueryDatabaseTable -> ConvertAvroToJSON -> ConvertJSONToSQL -> PutSQL In this flow the target tables would need to be created beforehand. You might be able to do that with pg_dump or with some fancy SQL that you could send to PutSQL in a separate (do-ahead) flow [1]. For multiple tables, you will need one QueryDatabaseTable for each table; depending on the number of tables and the latency for getting/putting rows, you may be able to share the downstream processing. If that creates a bottleneck, you may want a copy of the above flow for each table. This is drastically improved in NiFi 1.2.0, as you can use ListDatabaseTables -> GenerateTableFetch -> RPG -> Input Port -> ExecuteSQL to perform the migration in parallel across a NiFi cluster. Regards, Matt [1] https://serverfault.com/questions/231952/is-there-a-mysql-equivalent-of-show-create-table-in-postgres On Tue, May 2, 2017 at 11:18 AM, Anil Rai <anilrain...@gmail.com> wrote: > Thanks Matt for the quick reply. We are using nifi 1.0 release as of now. > It's a postgres DB on both sides (on prem and on cloud) > and yes incremental load is what i am looking for..... > so with that, you recommend # 2 option? > > On Tue, May 2, 2017 at 11:00 AM, Matt Burgess <mattyb...@apache.org> wrote: > >> Anil, >> >> Is this a "one-time" migration, meaning you would take the on-prem >> tables and put them on the cloud DB just once? Or would this be an >> incremental operation, where you do the initial move and then take any >> "new" rows from the source and apply them to the target? For the >> latter, there are a couple of options: >> >> 1) Rebuild the cloud DB periodically. You can use ExecuteSQL -> >> [processors] -> PutSQL after perhaps deleting your target >> DB/tables/etc. This could be time-consuming and expensive. The >> processors in-between probably include ConvertAvroToJSON and >> ConvertJSONToSQL. >> 2) Use QueryDatabaseTable or (GenerateTableFetch -> ExecuteSQL) to get >> the source data. For this your table would need a column whose values >> always increase, that column would comprise the value of the "Maximum >> Value Column" property in the aforementioned processors' configuration >> dialogs. You would need one QueryDatabaseTable or GenerateTableFetch >> for each table in your DB. >> >> In addition to these current solutions, as of the upcoming NiFi 1.2.0 >> release, you have the following options: >> 3) If the source database is MySQL, you can use the CaptureChangeMySQL >> processor to get binary log events flowing through various processors >> into PutDatabaseRecord to place them at the source. This pattern is >> true Change Data Capture (CDC) versus the other two options above. >> 4) Option #2 will be improved such that GenerateTableFetch will accept >> incoming flow files, so you can use ListDatabaseTables -> >> GenerateTableFetch -> ExecuteSQL to handle multiple tables with one >> flow. >> >> If this is a one-time migration, a data flow tool might not be the >> best choice, you could consider something like Flyway [1] instead. >> >> Regards, >> Matt >> >> [1] https://flywaydb.org/documentation/command/migrate >> >> On Tue, May 2, 2017 at 10:41 AM, Anil Rai <anilrain...@gmail.com> wrote: >> > I have a simple use case. >> > >> > DB (On Premise) and DB (On Cloud). >> > >> > I want to use nifi to extract data from on prem DB (huge volumes) and >> > insert into the same table structure that is hosted on cloud. >> > >> > I could use ExecuteSQL on both sides of the fence (to extract from on >> prem >> > and insert onto cloud). What processors are needed in between (if at >> all)? >> > As i am not doing any transformations at all....it is just extract and >> load >> > use case >>