Re: [PR] Enable configuration of a CDC mutation info Callable for CDC Writes into BigQuery [beam]

via GitHub Mon, 02 Dec 2024 10:51:34 -0800


prodriguezdefino commented on code in PR #32878:
URL: https://github.com/apache/beam/pull/32878#discussion_r1866356320



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -2550,7 +2577,7 @@ def __init__(
       use_at_least_once=False,
       with_auto_sharding=False,
       num_storage_api_streams=0,
-      use_cdc_writes: bool = False,
+      use_cdc_writes: UseCdcWrites = False,

Review Comment:
   Originally this change was part of #32529 and given the push to get it 
included for the v2.60.0 release @ahmedabu98 recommended partition it in two 
different PRs. 
   
   Before #32529 Python SDK was not capable of using CDC writes into BQ and 
this change brings parity with BigQueryIO by exposing a lambda to include Row 
Mutation information as part of the row to be ingested. Currently, a SDK user 
would need to know how to structure their Rows to be ingested using CDC or, if 
they use Dicts as their data format, their provided schema should include the 
row mutation information making it not matching with the actual BigQuery table 
schema they want to write to (otherwise the xlang protocol wouldn't work). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Enable configuration of a CDC mutation info Callable for CDC Writes into BigQuery [beam]

Reply via email to