What exactly is the issue? If the cache is empty, then BigQueryIO will try and create the table again, and the creation will fail since the table exists. This is working as intended.
The only reason for the cache is so that BigQueryIO doesn't continuously hammer BigQuery with creation requests every second. On Wed, Dec 2, 2020 at 3:20 PM Vasu Gupta <[email protected]> wrote: > Hey folks, > > While using BigQueryIO for 10k tables insertion, I found that it has an > issue in it's local caching technique for table creation. Tables are first > search in BigQueryIO's local cache and then checks whether to create a > table or not. The main issue is when inserting to thousands of table: let's > suppose we have 10k tables to insert in realtime and now since we will > deploy a fresh dataflow pipeline once in a week, local cache will be empty > and it will take a huge time just to build that cache for 10k tables even > though these 10k tables were already created in BigQuery. > > The solution i could propose for this is we can provide an option for > using external caching services like Redis/Memcached so that we don't have > to rebuild cache again and again after a fresh deployment of pipeline. >
