nsivabalan commented on code in PR #9482:
URL: https://github.com/apache/hudi/pull/9482#discussion_r1306300959


##########
hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java:
##########
@@ -52,34 +57,55 @@ public class BigQuerySyncTool extends HoodieSyncTool {
 
   private static final Logger LOG = 
LoggerFactory.getLogger(BigQuerySyncTool.class);
 
-  public final BigQuerySyncConfig config;
-  public final String tableName;
-  public final String manifestTableName;
-  public final String versionsTableName;
-  public final String snapshotViewName;
+  private final BigQuerySyncConfig config;
+  private final String tableName;
+  private final String manifestTableName;
+  private final String versionsTableName;
+  private final String snapshotViewName;
+  private final ManifestFileWriter manifestFileWriter;
+  private final HoodieBigQuerySyncClient bqSyncClient;
+  private final HoodieTableMetaClient metaClient;
+  private final BigQuerySchemaResolver bqSchemaResolver;
 
   public BigQuerySyncTool(Properties props) {
-    super(props);
+    // will build file writer, client, etc. from configs
+    this(props, null, null, null, null);
+  }
+
+  @VisibleForTesting // allows us to pass in mocks for the writer and client
+  BigQuerySyncTool(Properties properties, ManifestFileWriter 
manifestFileWriter, HoodieBigQuerySyncClient providedBqSyncClient, 
HoodieTableMetaClient providedMetaClient,
+                   BigQuerySchemaResolver providedBqSchemaResolver) {
+    super(properties);
     this.config = new BigQuerySyncConfig(props);
     this.tableName = config.getString(BIGQUERY_SYNC_TABLE_NAME);
     this.manifestTableName = tableName + "_manifest";
     this.versionsTableName = tableName + "_versions";
     this.snapshotViewName = tableName;
+    this.bqSyncClient = providedBqSyncClient == null ? new 
HoodieBigQuerySyncClient(config) : providedBqSyncClient;
+    // reuse existing meta client if not provided (only test cases will 
provide their own meta client)

Review Comment:
   then, how do we ensure we can catch issues within 
buildManifestFileWriterFromConfig, HoodieBigQuerySyncClient() constructor etc, 
if we override all these w/ provided entities in tests? 
   



##########
hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java:
##########
@@ -52,34 +57,55 @@ public class BigQuerySyncTool extends HoodieSyncTool {
 
   private static final Logger LOG = 
LoggerFactory.getLogger(BigQuerySyncTool.class);
 
-  public final BigQuerySyncConfig config;
-  public final String tableName;
-  public final String manifestTableName;
-  public final String versionsTableName;
-  public final String snapshotViewName;
+  private final BigQuerySyncConfig config;
+  private final String tableName;
+  private final String manifestTableName;
+  private final String versionsTableName;
+  private final String snapshotViewName;
+  private final ManifestFileWriter manifestFileWriter;
+  private final HoodieBigQuerySyncClient bqSyncClient;
+  private final HoodieTableMetaClient metaClient;
+  private final BigQuerySchemaResolver bqSchemaResolver;
 
   public BigQuerySyncTool(Properties props) {
-    super(props);
+    // will build file writer, client, etc. from configs
+    this(props, null, null, null, null);
+  }
+
+  @VisibleForTesting // allows us to pass in mocks for the writer and client
+  BigQuerySyncTool(Properties properties, ManifestFileWriter 
manifestFileWriter, HoodieBigQuerySyncClient providedBqSyncClient, 
HoodieTableMetaClient providedMetaClient,
+                   BigQuerySchemaResolver providedBqSchemaResolver) {
+    super(properties);
     this.config = new BigQuerySyncConfig(props);
     this.tableName = config.getString(BIGQUERY_SYNC_TABLE_NAME);
     this.manifestTableName = tableName + "_manifest";
     this.versionsTableName = tableName + "_versions";
     this.snapshotViewName = tableName;
+    this.bqSyncClient = providedBqSyncClient == null ? new 
HoodieBigQuerySyncClient(config) : providedBqSyncClient;
+    // reuse existing meta client if not provided (only test cases will 
provide their own meta client)

Review Comment:
   I also don't have a good way around this. my only concern is, if we start to 
set precedence, down the line everyone will start to follow similar pattern and 
might miss out on some actual issues in the source code. May be in this patch, 
I don't see major gaps. but once we set precedence, its hard to keep track 
(since we don't review every patch in OSS)



##########
hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/HoodieBigQuerySyncClient.java:
##########
@@ -147,6 +159,31 @@ public void createManifestTable(String tableName, String 
sourceUri) {
     }
   }
 
+  /**
+   * Updates the schema for the given table if the schema has changed.
+   * @param tableName name of the table in BigQuery
+   * @param schema latest schema for the table
+   */
+  public void updateTableSchema(String tableName, Schema schema, List<String> 
partitionFields) {
+    Table existingTable = bigquery.getTable(TableId.of(projectId, datasetName, 
tableName));
+    ExternalTableDefinition definition = existingTable.getDefinition();
+    Schema remoteTableSchema = definition.getSchema();
+    // Add the partition fields into the schema to avoid conflicts while 
updating
+    List<Field> updatedTableFields = remoteTableSchema.getFields().stream()
+        .filter(field -> partitionFields.contains(field.getName()))
+        .collect(Collectors.toList());
+    updatedTableFields.addAll(schema.getFields());
+    Schema finalSchema = Schema.of(updatedTableFields);
+    if (definition.getSchema() != null && 
definition.getSchema().equals(finalSchema)) {
+      return; // No need to update schema.
+    }
+    Table updatedTable = existingTable.toBuilder()

Review Comment:
   can you add java docs around this in lines 163 ish. 



##########
hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java:
##########
@@ -52,34 +57,55 @@ public class BigQuerySyncTool extends HoodieSyncTool {
 
   private static final Logger LOG = 
LoggerFactory.getLogger(BigQuerySyncTool.class);
 
-  public final BigQuerySyncConfig config;
-  public final String tableName;
-  public final String manifestTableName;
-  public final String versionsTableName;
-  public final String snapshotViewName;
+  private final BigQuerySyncConfig config;
+  private final String tableName;
+  private final String manifestTableName;
+  private final String versionsTableName;
+  private final String snapshotViewName;
+  private final ManifestFileWriter manifestFileWriter;
+  private final HoodieBigQuerySyncClient bqSyncClient;
+  private final HoodieTableMetaClient metaClient;
+  private final BigQuerySchemaResolver bqSchemaResolver;
 
   public BigQuerySyncTool(Properties props) {
-    super(props);
+    // will build file writer, client, etc. from configs
+    this(props, null, null, null, null);
+  }
+
+  @VisibleForTesting // allows us to pass in mocks for the writer and client

Review Comment:
   yeah. generally we don't recommend taking the route of VisibleForTesting, 
esply with constructor. 
   Isnt't there any other way to test this? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to