SteNicholas commented on a change in pull request #3787:
URL: https://github.com/apache/hudi/pull/3787#discussion_r728579081



##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##########
@@ -142,6 +143,60 @@ public WriteOperationType getOperationType() {
     return fileGroupIdToFullPaths;
   }
 
+  /**
+   * Extract the file status of all affected files from the commit metadata. 
If a file has
+   * been touched multiple times in the given commits, the return value will 
keep the one
+   * from the latest commit.
+   *
+   * @param basePath The base path
+   * @return the file full path to file status mapping
+   */
+  public Map<String, FileStatus> getFullPathToFileStatus(String basePath) {
+    Map<String, FileStatus> fullPathToFileStatus = new HashMap<>();
+    for (List<HoodieWriteStat> stats : getPartitionToWriteStats().values()) {
+      // Iterate through all the written files.
+      for (HoodieWriteStat stat : stats) {
+        String relativeFilePath = stat.getPath();
+        Path fullPath = relativeFilePath != null ? 
FSUtils.getPartitionPath(basePath, relativeFilePath) : null;

Review comment:
       IMO, this could directly check whether relativeFilePath is null to put 
fileStatus into fullPathToFileStatus.

##########
File path: 
hudi-common/src/main/java/org/apache/hudi/common/model/HoodieCommitMetadata.java
##########
@@ -142,6 +143,60 @@ public WriteOperationType getOperationType() {
     return fileGroupIdToFullPaths;
   }
 
+  /**
+   * Extract the file status of all affected files from the commit metadata. 
If a file has
+   * been touched multiple times in the given commits, the return value will 
keep the one
+   * from the latest commit.
+   *
+   * @param basePath The base path
+   * @return the file full path to file status mapping
+   */
+  public Map<String, FileStatus> getFullPathToFileStatus(String basePath) {
+    Map<String, FileStatus> fullPathToFileStatus = new HashMap<>();
+    for (List<HoodieWriteStat> stats : getPartitionToWriteStats().values()) {
+      // Iterate through all the written files.
+      for (HoodieWriteStat stat : stats) {
+        String relativeFilePath = stat.getPath();
+        Path fullPath = relativeFilePath != null ? 
FSUtils.getPartitionPath(basePath, relativeFilePath) : null;
+        if (fullPath != null) {
+          FileStatus fileStatus = new FileStatus(stat.getFileSizeInBytes(), 
false, 0, 0,
+              0, fullPath);
+          fullPathToFileStatus.put(fullPath.getName(), fileStatus);
+        }
+      }
+    }
+    return fullPathToFileStatus;
+  }
+
+  /**
+   * Extract the file status of all affected files from the commit metadata. 
If a file has
+   * been touched multiple times in the given commits, the return value will 
keep the one
+   * from the latest commit by file group ID.
+   *
+   * <p>Note: different with {@link #getFullPathToFileStatus(String)},
+   * only the latest commit file for a file group is returned,
+   * this is an optimization for COPY_ON_WRITE table to eliminate legacy files 
for filesystem view.
+   *
+   * @param basePath The base path
+   * @return the file ID to file status mapping
+   */
+  public Map<String, FileStatus> getFileIdToFileStatus(String basePath) {
+    Map<String, FileStatus> fileIdToFileStatus = new HashMap<>();
+    for (List<HoodieWriteStat> stats : getPartitionToWriteStats().values()) {
+      // Iterate through all the written files.
+      for (HoodieWriteStat stat : stats) {
+        String relativeFilePath = stat.getPath();
+        Path fullPath = relativeFilePath != null ? 
FSUtils.getPartitionPath(basePath, relativeFilePath) : null;

Review comment:
       Ditto.

##########
File path: 
hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteFunction.java
##########
@@ -139,7 +139,7 @@ public void processElement(I value, ProcessFunction<I, 
Object>.Context ctx, Coll
   public void close() {
     if (this.writeClient != null) {
       this.writeClient.cleanHandlesGracefully();
-      this.writeClient.close();
+      // this.writeClient.close();

Review comment:
       Why doesn't this invoke the `close` method of the write client?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to