Raymond Xu created HUDI-712: ------------------------------- Summary: Improve exporter performance and memory usage Key: HUDI-712 URL: https://issues.apache.org/jira/browse/HUDI-712 Project: Apache Hudi (incubating) Issue Type: Improvement Components: Utilities Reporter: Raymond Xu
[https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L103-L107] The way the data file list for export is collected can be improved due to * not parallelized among partitions * the list can be too large * listing partition to get the latest files requires scanning all files (RFC-15 could solve this) -- This message was sent by Atlassian Jira (v8.3.4#803005)