reuvenlax commented on issue #23291:
URL: https://github.com/apache/beam/issues/23291#issuecomment-1322579939

   This is an interesting situation. The contract with Storage API is that the 
descriptor passed in when you open the connection is compatible with the actual 
BQ table schema, and this descriptor is derived from the schema returned by 
getSchema. In this case, BigQuery is failing things when we establish the 
connection (due to the schema mismatch) and before we've sent a single record, 
which is why things aren't going to the failedInserts collection.
   
   In general, this seems like an unsupported feature. The schema returned by 
getSchema must be compatible with the actual BQ table schema. If it does not, 
various things could go wrong in strange ways. 
   
   Can you explain how things get out of sync here? Adding new REQUIRED columns 
to a table is not allowed by BigQuery, so this isn't simply a case where the 
schema service is returning and old value for the schema.
   
   Another note: if you are calling an external service in getSchema, make sure 
the value is well cached locally. getSchema is called potentially on every 
record, so this could cause major performance issues in your pipeline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to