xushiyan commented on a change in pull request #2845:
URL: https://github.com/apache/hudi/pull/2845#discussion_r636615564



##########
File path: 
hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
##########
@@ -219,13 +221,19 @@ public static void createBaseFile(String basePath, String 
partitionPath, String
 
   public static void createBaseFile(String basePath, String partitionPath, 
String instantTime, String fileId, long length)
       throws Exception {
+    createBaseFile(basePath, partitionPath, instantTime, fileId, length, 
Instant.now().toEpochMilli());
+  }
+
+  public static void createBaseFile(String basePath, String partitionPath, 
String instantTime, String fileId, long length, long lastModificationTimeMilli)
+      throws Exception {
     Path parentPath = Paths.get(basePath, partitionPath);
     Files.createDirectories(parentPath);
     Path baseFilePath = parentPath.resolve(baseFileName(instantTime, fileId));
     if (Files.notExists(baseFilePath)) {
       Files.createFile(baseFilePath);
     }
     new RandomAccessFile(baseFilePath.toFile(), "rw").setLength(length);
+    Files.setLastModifiedTime(baseFilePath, 
FileTime.fromMillis(lastModificationTimeMilli));

Review comment:
       @nsivabalan the problem comes from mod time being the same for multiple 
input files. I uploaded the screenshot in the JIRA ticket. Also posting here 
for easy illustration
   
   ![Screen Shot 2021-03-26 at 1 42 42 
AM](https://user-images.githubusercontent.com/2701446/119078833-cde7d180-b9ab-11eb-9d0f-32625dc30b3c.png)
   
   DFSPathSelector reads last modification time and saves it as checkpoint, 
which is then used to compare with next batch of input files. It's not about 
files being mutated; the input files are append-only and last mod time _is_ 
create time. The test setup is to ensure last mod time being the same to avoid 
code execution causing delays when creating them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to