ahmedabu98 commented on code in PR #32529: URL: https://github.com/apache/beam/pull/32529#discussion_r1799340244
########## sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/providers/BigQueryStorageWriteApiSchemaTransformProvider.java: ########## @@ -365,6 +394,34 @@ public TableSchema getSchema(String destination) { } } + private static class CdcWritesDynamicDestination extends RowDynamicDestinations { Review Comment: For simplicity, I suggest we expand `RowDynamicDestinations` to include an optional primary key, i.e. `RowDynamicDestinations(Schema schema, @Nullable List<String> primaryKey)`. And we can add this `getTableConstraints()` implementation to `RowDynamicDestinations` (if primaryKey exists, return an appropriate `TableConstraints` object, else null). ########## sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/providers/BigQueryStorageWriteApiSchemaTransformProvider.java: ########## @@ -498,5 +563,52 @@ BigQueryIO.Write<Row> createStorageWriteApiTransform(Schema schema) { return write; } + + BigQueryIO.Write<Row> validateAndIncludeCDCInformation( + BigQueryIO.Write<Row> write, Schema schema) { + checkArgument( + schema.getFieldNames().containsAll(Arrays.asList(ROW_PROPERTY_MUTATION_INFO, "record")), + "When writing using CDC functionality, we expect Row Schema with a " + + "\"" + + ROW_PROPERTY_MUTATION_INFO + + "\" Row field and a \"record\" Row field."); + checkArgument( + schema + .getField(ROW_PROPERTY_MUTATION_INFO) + .getType() + .getRowSchema() + .equals(ROW_SCHEMA_MUTATION_INFO), + "When writing using CDC functionality, we expect a \"" + + ROW_PROPERTY_MUTATION_INFO + + "\" field of Row type with fields \"" + + ROW_PROPERTY_MUTATION_TYPE + + "\" and \"" + + ROW_PROPERTY_MUTATION_SQN + + "\" both of type string."); + + String tableDestination = null; + + if (configuration.getTable().equals(DYNAMIC_DESTINATIONS)) { + validateDynamicDestinationsExpectedSchema(schema); + } else { + tableDestination = configuration.getTable(); + } + + return write + .to( + new CdcWritesDynamicDestination( + schema.getField("record").getType().getRowSchema(), + tableDestination, + configuration.getPrimaryKey())) Review Comment: After resolving to just `RowDynamicDestinations`, we can remove these lines. The rest of this part looks good ########## sdks/python/apache_beam/io/external/xlang_bigqueryio_it_test.py: ########## @@ -259,11 +259,11 @@ def test_write_with_beam_rows_cdc(self): rows_with_cdc = [ beam.Row( - cdc_info=beam.Row( + row_mutation_info=beam.Row( Review Comment: Let's add an identical test for python dicts (I remember you had one previously with the callable function, but we can leave that part out) ########## sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/providers/BigQueryStorageWriteApiSchemaTransformProvider.java: ########## @@ -466,11 +530,11 @@ BigQueryIO.Write<Row> createStorageWriteApiTransform(Schema schema) { .withFormatFunction(BigQueryUtils.toTableRow()) .withWriteDisposition(WriteDisposition.WRITE_APPEND); - if (configuration.getTable().equals(DYNAMIC_DESTINATIONS)) { - checkArgument( - schema.getFieldNames().equals(Arrays.asList("destination", "record")), - "When writing to dynamic destinations, we expect Row Schema with a " - + "\"destination\" string field and a \"record\" Row field."); + // in case CDC writes are configured we validate and include them in the configuration + if (Optional.ofNullable(configuration.getUseCdcWrites()).orElse(false)) { + write = validateAndIncludeCDCInformation(write, schema); Review Comment: After resolving to just `RowDynamicDestinations`, we can bring this check down below. i.e. the order should be: - if DynamicDestinations, apply `to(RowDynamicDestinations)` - else, apply `to(table)` - ... - if CdcWrites, apply CDC information -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org