[GitHub] [hudi] nsivabalan commented on a change in pull request #4186: [HUDI-2904] WIP Fix metadata archive issues

GitBox Thu, 02 Dec 2021 03:53:56 -0800


nsivabalan commented on a change in pull request #4186:
URL: https://github.com/apache/hudi/pull/4186#discussion_r761017470




##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
##########
@@ -57,6 +57,13 @@
           + " to delete older file slices. It's recommended to enable this, to 
ensure metadata and data storage"
           + " growth is bounded.");
 
+  public static final ConfigProperty<String> AUTO_ARCHIVE = ConfigProperty
+      .key("hoodie.archive.automatic")

Review comment:
       hoodie.auto.archive to be in line with other configs (hoodie.auto.clean)

##########
File path: 
hudi-client/hudi-spark-client/src/test/resources/log4j-surefire.properties
##########
@@ -27,5 +27,5 @@ log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
 log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
 log4j.appender.CONSOLE.filter.a=org.apache.log4j.varia.LevelRangeFilter
 log4j.appender.CONSOLE.filter.a.AcceptOnMatch=true
-log4j.appender.CONSOLE.filter.a.LevelMin=WARN
+log4j.appender.CONSOLE.filter.a.LevelMin=INFO

Review comment:
       we need to revert all these changes. I assume its done for testing. 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##########
@@ -743,6 +741,27 @@ public HoodieCleanMetadata clean(boolean skipLocking) {
     return clean(HoodieActiveTimeline.createNewInstantTime(), skipLocking);
   }
 
+  /**
+   * Trigger archival for the table. This ensures that the number of commits 
do not explode
+   * and keep increasing unbounded over time.
+   * @param table table to commit on.
+   */
+  protected void archive(HoodieTable<T, I, K, O> table) {
+    try {
+      // We cannot have unbounded commit files. Archive commits if we have to 
archive
+      HoodieTimelineArchiveLog archiveLog = new 
HoodieTimelineArchiveLog(config, table);
+      archiveLog.archiveIfRequired(context);
+    } catch (IOException ioe) {
+      throw new HoodieIOException("Failed to archive", ioe);
+    }
+  }
+
+  public void archive() {

Review comment:
       java docs

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java
##########
@@ -144,6 +144,9 @@ protected void commit(HoodieData<HoodieRecord> 
hoodieDataRecords, String partiti
         metadataMetaClient.reloadActiveTimeline();
       }
       List<WriteStatus> statuses = writeClient.upsertPreppedRecords(recordRDD, 
instantTime).collect();
+      if (canTriggerTableService) {

Review comment:
       wondering if we should move this to after 161. to be specific, after 
compaction and cleaning. Even in regular flow, cleaning comes first followed by 
archival.  

##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java
##########
@@ -250,6 +251,25 @@ public void testTableOperations(HoodieTableType tableType, 
boolean enableFullSca
     validateMetadata(testTable, emptyList(), true);
   }
 
+  @Test
+  public void testMetadataConcurrentWriters() throws Exception {
+    init(HoodieTableType.MERGE_ON_READ);
+    doWriteOperation(testTable, "0000001", INSERT);
+    AtomicInteger commitTime = new AtomicInteger(2);
+    int i = 1;
+    for (; i <= 50; i++) {

Review comment:
       guess with the fix, we can simplify the test. Whenever metadata table is 
eligible for archival, we can trigger a table service and ensure archival does 
not kick in. And following this, we can trigger a regular write and ensure 
archival kicks in. Anyways, existing test is not very deterministic. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a change in pull request #4186: [HUDI-2904] WIP Fix metadata archive issues

Reply via email to