Hey folks, 

While using BigQueryIO for 10k tables insertion, I found that it has an issue 
in it's local caching technique for table creation. Tables are first search in 
BigQueryIO's local cache and then checks whether to create a table or not. The 
main issue is when inserting to thousands of table: let's suppose we have 10k 
tables to insert in realtime and now since we will deploy a fresh dataflow 
pipeline once in a week, local cache will be empty and it will take a huge time 
just to build that cache for 10k tables even though these 10k tables were 
already created in BigQuery.

The solution i could propose for this is we can provide an option for using 
external caching services like Redis/Memcached so that we don't have to rebuild 
cache again and again after a fresh deployment of pipeline.

Reply via email to