This should be possible using the Beam programmatic API. You can pass BigQueryIO a function that determines the BigQuery table based on the input element.
On Sat, May 30, 2020 at 9:20 PM Willem Pienaar <[email protected]> wrote: > Hi JB, > > Apologies for resurrecting this thread, but I have a related question. > > We've built a feature store Feast (https://github.com/feast-dev/feast) > primarily on Beam. We have been very happy with our decision to use Beam > thus far. Beam is mostly used as the ingestion layer that writes data into > stores (BigQuery, Redis). I am currently implementing JdbcIO (for > PostgreSQL) and it's working fine so far. I set up all the tables when the > job is launched, and I write into different tables depending on the input > elements. > > However, a problem we are facing is that schema changes are happening very > rapidly based on our users' activity. Every time the user changes a > collection of features/fields, we have to launch a new Dataflow job in > order to support the new database schema. This can take 3-4 minutes. Every > time the jobs are in an updating state we have to block all user activity, > which is quite disruptive. > > What we want to do is dynamically configure the SQL insert statement based > on the input elements. This would allow us to keep the same job running > indefinitely, dramatically improving the user experience. We have found > solutions for BigQueryIO and our other IO, but not yet for JdbcIO. As far > as I can tell it isn't possible to modify the SQL insert statement to write > to a new table or to the same table with new columns, without restarting > the job. > > Do you have any suggestions one how we can achieve the above? If it can't > be done with the current implementation, would it be reasonable to > contribute this functionality back to Beam? > > Regards, > Willem > > On Tue, Mar 3, 2020, at 1:30 AM, Jean-Baptiste Onofre wrote: > > Hi > > > > You have the setPrepareStatement() method where you define the target > tables. > > However, it’s in the same database (datasource) per pipeline. > > > > You can define several datasources and use a different datasource in > > each JdbcIO write. Meaning that you can divide in sub pipelines. > > > > Regards > > JB > > > > > Le 29 févr. 2020 à 17:52, Vasu Gupta <[email protected]> a > écrit : > > > > > > Hey folks, > > > > > > Can we use JdbcIO for writing data to multiple Schemas(For Postgres > Database) dynamically using Apache beam Java Framework? Currently, I can't > find any property that I could set to JdbcIO transform for providing schema > or maybe I am missing something. > > > > > > Thanks > > > > >
