Re: [PR] chore: Test Runtime Improvements: lower number of files, parallelize reads [hudi]

via GitHub Tue, 23 Dec 2025 19:19:13 -0800


voonhous commented on code in PR #17671:
URL: https://github.com/apache/hudi/pull/17671#discussion_r2644694278



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java:
##########
@@ -183,59 +180,53 @@ public void testWriteDuringCompaction(String 
payloadClass, HoodieIndex.IndexType
   @ParameterizedTest
   @MethodSource("writeLogTest")
   public void testWriteLogDuringCompaction(boolean enableMetadataTable, 
boolean enableTimelineServer) throws IOException {
-    try {
-      //disable for this test because it seems like we process mor in a 
different order?
-      
jsc().hadoopConfiguration().set(HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key(),
 "false");

Review Comment:
   Don't quite understand how the changes here disables MDT. `writeLogTest` is 
still supplying flags to enable/disable MDT in:
   
   ```java
   HoodieMetadataConfig.newBuilder().enable(enableMetadataTable).build()
   ```



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/client/TestHoodieClientMultiWriter.java:
##########
@@ -457,7 +457,7 @@ private void 
testHoodieClientBasicMultiWriterWithEarlyConflictDetection(String t
       setUpMORTestTable();
     }
 
-    int heartBeatIntervalForCommit4 = 10 * 1000;
+    int heartBeatIntervalForCommit4 = 3 * 1000;

Review Comment:
   I see a thread.sleep for this. 
   
   Is it possible to utilise manual setting of modification time for the 
`Thead.sleep` that utilise this interval?
   
   ```java
       StoragePath heartbeatFilePath = new StoragePath(
           HoodieTableMetaClient.getHeartbeatFolderPath(basePath) + 
StoragePath.SEPARATOR + nextCommitTime3);
       storage.create(heartbeatFilePath, true);
   
       // Wait for heart beat expired for failed commitTime3 "003"
       // Otherwise commit4 still can see conflict between failed write 003.
       // Thread.sleep(heartBeatIntervalForCommit4 * 2);
       storage.setLastModifiedTime(heartbeatFilePath, 
System.currentTimeMillis() - heartBeatIntervalForCommit4 * 2);
   ```
   
   Or will it be too complex where this manual setting of file mod time will 
require us to translate all other hoodie related meta files too? 



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableCompaction.java:
##########
@@ -141,8 +140,6 @@ public void testWriteDuringCompaction(String payloadClass, 
HoodieIndex.IndexType
             .withMaxNumDeltaCommitsBeforeCompaction(1)
             .compactionSmallFileSize(0)
             .build())
-        .withStorageConfig(HoodieStorageConfig.newBuilder()
-            .parquetMaxFileSize(1024).build())

Review Comment:
   How does this optimize tests? IIUC, this value is in bytes, does this lead 
to too many small parquet files being created, causing overhead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] chore: Test Runtime Improvements: lower number of files, parallelize reads [hudi]

Reply via email to