yihua commented on code in PR #18181:
URL: https://github.com/apache/hudi/pull/18181#discussion_r2796270870
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -1121,6 +1121,9 @@ protected Option<HoodieTableMetadataWriter>
getMetadataWriter(
* Deletes the metadata table if the writer disables metadata table with
hoodie.metadata.enable=false
*/
public void maybeDeleteMetadataTable() {
Review Comment:
nit: the Javadoc above still only mentions `hoodie.metadata.enable=false` as
the trigger. Could you update it to mention the new `auto.delete.partitions`
config as well, so future readers understand why the method might be a no-op?
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkTable.java:
##########
@@ -114,14 +110,9 @@ protected Option<HoodieTableMetadataWriter>
getMetadataWriter(
HoodieTableMetadataWriter metadataWriter = streamingWrites
?
SparkMetadataWriterFactory.createWithStreamingWrites(getContext().getStorageConf(),
config, failedWritesCleaningPolicy, getContext(),
Option.of(triggeringInstantTimestamp))
: SparkMetadataWriterFactory.create(getContext().getStorageConf(),
config, failedWritesCleaningPolicy, getContext(),
Option.of(triggeringInstantTimestamp), metaClient.getTableConfig());
- try {
- if (isMetadataTableExists || metaClient.getStorage().exists(
-
HoodieTableMetadata.getMetadataTableBasePath(metaClient.getBasePath()))) {
- isMetadataTableExists = true;
- return Option.of(metadataWriter);
- }
- } catch (IOException e) {
- throw new HoodieMetadataException("Checking existence of metadata
table failed", e);
+ if (isMetadataTableExists || metadataWriter.isInitialized()) {
Review Comment:
I understand the changes avoid the deletion of the metadata table if the
feature flag is turned off. Have you checked if the MDT partitions can go
outdated if the metadata config does not have the MDT partitions enabled, but
the MDT table has the corresponding partitions and table config still marks
them as available to read?
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkTable.java:
##########
@@ -101,7 +97,7 @@ protected Option<HoodieTableMetadataWriter>
getMetadataWriter(
if (isMetadataTable()) {
return Option.empty();
}
- if (config.isMetadataTableEnabled()) {
+ if (getMetaClient().getTableConfig().isMetadataTableAvailable() ||
config.isMetadataTableEnabled()) {
Review Comment:
With the `isMetadataTableAvailable()` condition added, when
`config.isMetadataTableEnabled()` is false but the MDT exists on disk
(available in table config), we'll now create a metadata writer and return it.
Previously this path would have fallen to the `else` branch and called
`maybeDeleteMetadataTable()`. Is the intent that the
`isAutoDeleteMdtPartitionsEnabled` config guard in `maybeDeleteMetadataTable()`
alone handles this, or is this `isMetadataTableAvailable()` change specifically
meant to keep the writer alive when auto-delete is disabled? It might be worth
adding a comment clarifying the intent.
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkTable.java:
##########
@@ -114,14 +110,9 @@ protected Option<HoodieTableMetadataWriter>
getMetadataWriter(
HoodieTableMetadataWriter metadataWriter = streamingWrites
?
SparkMetadataWriterFactory.createWithStreamingWrites(getContext().getStorageConf(),
config, failedWritesCleaningPolicy, getContext(),
Option.of(triggeringInstantTimestamp))
: SparkMetadataWriterFactory.create(getContext().getStorageConf(),
config, failedWritesCleaningPolicy, getContext(),
Option.of(triggeringInstantTimestamp), metaClient.getTableConfig());
- try {
- if (isMetadataTableExists || metaClient.getStorage().exists(
-
HoodieTableMetadata.getMetadataTableBasePath(metaClient.getBasePath()))) {
- isMetadataTableExists = true;
- return Option.of(metadataWriter);
- }
- } catch (IOException e) {
- throw new HoodieMetadataException("Checking existence of metadata
table failed", e);
+ if (isMetadataTableExists || metadataWriter.isInitialized()) {
Review Comment:
Let's also add a test case around this?
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -509,6 +509,18 @@ public final class HoodieMetadataConfig extends
HoodieConfig {
+ "The index name either starts with or matches exactly can be one
of the following: "
+
StringUtils.join(Arrays.stream(MetadataPartitionType.values()).map(MetadataPartitionType::getPartitionPath).collect(Collectors.toList()),
", "));
+ // Config to enable/disable automatic deletion of MDT partitions
Review Comment:
nit: redundant comment. The config docs is self-explained.
##########
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java:
##########
@@ -509,6 +509,18 @@ public final class HoodieMetadataConfig extends
HoodieConfig {
+ "The index name either starts with or matches exactly can be one
of the following: "
+
StringUtils.join(Arrays.stream(MetadataPartitionType.values()).map(MetadataPartitionType::getPartitionPath).collect(Collectors.toList()),
", "));
+ // Config to enable/disable automatic deletion of MDT partitions
+ public static final ConfigProperty<Boolean> AUTO_DELETE_PARTITIONS =
ConfigProperty
+ .key(METADATA_PREFIX + ".auto.delete.partitions")
+ .defaultValue(true)
Review Comment:
Let's add a test with this config disabled?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]