nsivabalan commented on a change in pull request #4186:
URL: https://github.com/apache/hudi/pull/4186#discussion_r761017470
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##########
@@ -57,6 +57,13 @@
+ " to delete older file slices. It's recommended to enable this, to
ensure metadata and data storage"
+ " growth is bounded.");
+ public static final ConfigProperty<String> AUTO_ARCHIVE = ConfigProperty
+ .key("hoodie.archive.automatic")
Review comment:
hoodie.auto.archive to be in line with other configs (hoodie.auto.clean)
##########
File path:
hudi-client/hudi-spark-client/src/test/resources/log4j-surefire.properties
##########
@@ -27,5 +27,5 @@ log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
log4j.appender.CONSOLE.filter.a=org.apache.log4j.varia.LevelRangeFilter
log4j.appender.CONSOLE.filter.a.AcceptOnMatch=true
-log4j.appender.CONSOLE.filter.a.LevelMin=WARN
+log4j.appender.CONSOLE.filter.a.LevelMin=INFO
Review comment:
we need to revert all these changes. I assume its done for testing.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -743,6 +741,27 @@ public HoodieCleanMetadata clean(boolean skipLocking) {
return clean(HoodieActiveTimeline.createNewInstantTime(), skipLocking);
}
+ /**
+ * Trigger archival for the table. This ensures that the number of commits
do not explode
+ * and keep increasing unbounded over time.
+ * @param table table to commit on.
+ */
+ protected void archive(HoodieTable<T, I, K, O> table) {
+ try {
+ // We cannot have unbounded commit files. Archive commits if we have to
archive
+ HoodieTimelineArchiveLog archiveLog = new
HoodieTimelineArchiveLog(config, table);
+ archiveLog.archiveIfRequired(context);
+ } catch (IOException ioe) {
+ throw new HoodieIOException("Failed to archive", ioe);
+ }
+ }
+
+ public void archive() {
Review comment:
java docs
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java
##########
@@ -144,6 +144,9 @@ protected void commit(HoodieData<HoodieRecord>
hoodieDataRecords, String partiti
metadataMetaClient.reloadActiveTimeline();
}
List<WriteStatus> statuses = writeClient.upsertPreppedRecords(recordRDD,
instantTime).collect();
+ if (canTriggerTableService) {
Review comment:
wondering if we should move this to after 161. to be specific, after
compaction and cleaning. Even in regular flow, cleaning comes first followed by
archival.
##########
File path:
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -250,6 +251,25 @@ public void testTableOperations(HoodieTableType tableType,
boolean enableFullSca
validateMetadata(testTable, emptyList(), true);
}
+ @Test
+ public void testMetadataConcurrentWriters() throws Exception {
+ init(HoodieTableType.MERGE_ON_READ);
+ doWriteOperation(testTable, "0000001", INSERT);
+ AtomicInteger commitTime = new AtomicInteger(2);
+ int i = 1;
+ for (; i <= 50; i++) {
Review comment:
guess with the fix, we can simplify the test. Whenever metadata table is
eligible for archival, we can trigger a table service and ensure archival does
not kick in. And following this, we can trigger a regular write and ensure
archival kicks in. Anyways, existing test is not very deterministic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]