lokeshj1703 commented on code in PR #7719:
URL: https://github.com/apache/hudi/pull/7719#discussion_r1193667584
##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveSyncClient.java:
##########
@@ -138,7 +138,7 @@ public void updateTableProperties(String tableName,
Map<String, String> tablePro
}
@Override
- public void updateSerdeProperties(String tableName, Map<String, String>
serdeProperties) {
+ public void updateStorageDescriptor(String tableName, Map<String, String>
serdeProperties, String inputFormatClassName) {
Review Comment:
We can make `inputFormatClassName` optional.
##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncClient.java:
##########
@@ -164,4 +164,11 @@ public List<PartitionEvent>
getPartitionEvents(List<Partition> tablePartitions,
}
return events;
}
+
+ /**
+ * Update the Storage Descriptor in hive metastore.
+ */
+ public void updateStorageDescriptor(String tableName, Map<String, String>
serdeProperties, String inputFormatClassName) {
Review Comment:
We are renaming `updateSerdeProperties`in HoodieHiveSyncClient and creating
a new API here? Do we need backwards compatibility for this API? Should we make
sure the older API still works?
##########
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java:
##########
@@ -897,6 +903,56 @@ public void testSyncMergeOnReadWithStrategy(String
syncMode, HoodieSyncTableStra
}
}
+ @ParameterizedTest
+ @EnumSource(value = HoodieSyncTableStrategy.class, names = {"RO", "RT"})
+ public void
testSyncMergeOnReadWithStrategyWhenTableExist(HoodieSyncTableStrategy strategy)
throws IOException, URISyntaxException, InterruptedException {
Review Comment:
Can we add some more comments in the test describing the steps?
##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveSyncClient.java:
##########
@@ -147,18 +147,25 @@ public void updateSerdeProperties(String tableName,
Map<String, String> serdePro
Table table = client.getTable(databaseName, tableName);
StorageDescriptor storageDescriptor = table.getSd();
SerDeInfo serdeInfo = storageDescriptor.getSerdeInfo();
+ boolean hasDiff = false;
if (serdeInfo != null && serdeInfo.getParametersSize() ==
serdeProperties.size()) {
Map<String, String> parameters = serdeInfo.getParameters();
- boolean different = serdeProperties.entrySet().stream().anyMatch(e ->
+ hasDiff = serdeProperties.entrySet().stream().anyMatch(e ->
!parameters.containsKey(e.getKey()) ||
!parameters.get(e.getKey()).equals(e.getValue()));
- if (!different) {
- LOG.debug("Table " + tableName + " serdeProperties already up to
date, skip update serde properties.");
- return;
- }
+ }
+ if (inputFormatClassName != null &&
!storageDescriptor.getInputFormat().equalsIgnoreCase(inputFormatClassName)) {
+ storageDescriptor.setInputFormat(inputFormatClassName);
Review Comment:
Let's move the set call below with other setters of `storageDescriptor`.
##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java:
##########
@@ -325,13 +318,19 @@ private boolean syncSchema(String tableName, boolean
tableExists, boolean useRea
// Sync the table properties if the schema has changed
if (config.getString(HIVE_TABLE_PROPERTIES) != null ||
config.getBoolean(HIVE_SYNC_AS_DATA_SOURCE_TABLE)) {
syncClient.updateTableProperties(tableName, tableProperties);
- syncClient.updateSerdeProperties(tableName, serdeProperties);
+ syncClient.updateStorageDescriptor(tableName, serdeProperties, null);
LOG.info("Sync table properties for " + tableName + ", table
properties is: " + tableProperties);
}
schemaChanged = true;
} else {
LOG.info("No Schema difference for " + tableName);
}
+
+ if (hiveSyncTableStrategy != null &&
!hiveSyncTableStrategy.equalsIgnoreCase(HoodieSyncTableStrategy.ALL.name())) {
Review Comment:
What if strategy is set to ALL and both the ro and rt tables exist? I feel
behavior should be similar in both the cases.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]