Hi Matt, I quickly developed this and this is how i could do this
DataLake<-ExecuteSQL->ConvertAveroToJson->SplitJson->EvaluateJsonPath->ReplaceText->PutSQL->Postgres(onCloud) The problem is, this will not scale for huge volumes. Any thoughts? Regards Anil On Tue, May 2, 2017 at 12:07 PM, Matt Burgess <[email protected]> wrote: > Yes that sounds like your best bet, assuming you have the "Maximum > Value Column" present in the table you want to migrate. Then a flow > might look like: > > QueryDatabaseTable -> ConvertAvroToJSON -> ConvertJSONToSQL -> PutSQL > > In this flow the target tables would need to be created beforehand. > You might be able to do that with pg_dump or with some fancy SQL that > you could send to PutSQL in a separate (do-ahead) flow [1]. For > multiple tables, you will need one QueryDatabaseTable for each table; > depending on the number of tables and the latency for getting/putting > rows, you may be able to share the downstream processing. If that > creates a bottleneck, you may want a copy of the above flow for each > table. This is drastically improved in NiFi 1.2.0, as you can use > ListDatabaseTables -> GenerateTableFetch -> RPG -> Input Port -> > ExecuteSQL to perform the migration in parallel across a NiFi cluster. > > Regards, > Matt > > [1] https://serverfault.com/questions/231952/is-there-a- > mysql-equivalent-of-show-create-table-in-postgres > > > On Tue, May 2, 2017 at 11:18 AM, Anil Rai <[email protected]> wrote: > > Thanks Matt for the quick reply. We are using nifi 1.0 release as of now. > > It's a postgres DB on both sides (on prem and on cloud) > > and yes incremental load is what i am looking for..... > > so with that, you recommend # 2 option? > > > > On Tue, May 2, 2017 at 11:00 AM, Matt Burgess <[email protected]> > wrote: > > > >> Anil, > >> > >> Is this a "one-time" migration, meaning you would take the on-prem > >> tables and put them on the cloud DB just once? Or would this be an > >> incremental operation, where you do the initial move and then take any > >> "new" rows from the source and apply them to the target? For the > >> latter, there are a couple of options: > >> > >> 1) Rebuild the cloud DB periodically. You can use ExecuteSQL -> > >> [processors] -> PutSQL after perhaps deleting your target > >> DB/tables/etc. This could be time-consuming and expensive. The > >> processors in-between probably include ConvertAvroToJSON and > >> ConvertJSONToSQL. > >> 2) Use QueryDatabaseTable or (GenerateTableFetch -> ExecuteSQL) to get > >> the source data. For this your table would need a column whose values > >> always increase, that column would comprise the value of the "Maximum > >> Value Column" property in the aforementioned processors' configuration > >> dialogs. You would need one QueryDatabaseTable or GenerateTableFetch > >> for each table in your DB. > >> > >> In addition to these current solutions, as of the upcoming NiFi 1.2.0 > >> release, you have the following options: > >> 3) If the source database is MySQL, you can use the CaptureChangeMySQL > >> processor to get binary log events flowing through various processors > >> into PutDatabaseRecord to place them at the source. This pattern is > >> true Change Data Capture (CDC) versus the other two options above. > >> 4) Option #2 will be improved such that GenerateTableFetch will accept > >> incoming flow files, so you can use ListDatabaseTables -> > >> GenerateTableFetch -> ExecuteSQL to handle multiple tables with one > >> flow. > >> > >> If this is a one-time migration, a data flow tool might not be the > >> best choice, you could consider something like Flyway [1] instead. > >> > >> Regards, > >> Matt > >> > >> [1] https://flywaydb.org/documentation/command/migrate > >> > >> On Tue, May 2, 2017 at 10:41 AM, Anil Rai <[email protected]> > wrote: > >> > I have a simple use case. > >> > > >> > DB (On Premise) and DB (On Cloud). > >> > > >> > I want to use nifi to extract data from on prem DB (huge volumes) and > >> > insert into the same table structure that is hosted on cloud. > >> > > >> > I could use ExecuteSQL on both sides of the fence (to extract from on > >> prem > >> > and insert onto cloud). What processors are needed in between (if at > >> all)? > >> > As i am not doing any transformations at all....it is just extract and > >> load > >> > use case > >> >
