Hey Reuven, yes you are correct that BigQueryIO is working intended but the 
issue is that since it's a local cache, this cache will rebuild again from 
sratch when pipeline is redeployed which is very time consuming for thousands 
of table.

On 2020/12/03 17:58:04, Reuven Lax <[email protected]> wrote: 
> What exactly is the issue? If the cache is empty, then BigQueryIO will try
> and create the table again, and the creation will fail since the table
> exists. This is working as intended.
> 
> The only reason for the cache is so that BigQueryIO doesn't continuously
> hammer BigQuery with creation requests every second.
> 
> On Wed, Dec 2, 2020 at 3:20 PM Vasu Gupta <[email protected]> wrote:
> 
> > Hey folks,
> >
> > While using BigQueryIO for 10k tables insertion, I found that it has an
> > issue in it's local caching technique for table creation. Tables are first
> > search in BigQueryIO's local cache and then checks whether to create a
> > table or not. The main issue is when inserting to thousands of table: let's
> > suppose we have 10k tables to insert in realtime and now since we will
> > deploy a fresh dataflow pipeline once in a week, local cache will be empty
> > and it will take a huge time just to build that cache for 10k tables even
> > though these 10k tables were already created in BigQuery.
> >
> > The solution i could propose for this is we can provide an option for
> > using external caching services like Redis/Memcached so that we don't have
> > to rebuild cache again and again after a fresh deployment of pipeline.
> >
> 

Reply via email to