Raymond Xu created HUDI-712:
-------------------------------
Summary: Improve exporter performance and memory usage
Key: HUDI-712
URL: https://issues.apache.org/jira/browse/HUDI-712
Project: Apache Hudi (incubating)
Issue Type: Improvement
Components: Utilities
Reporter: Raymond Xu
[https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L103-L107]
The way the data file list for export is collected can be improved due to
* not parallelized among partitions
* the list can be too large
* listing partition to get the latest files requires scanning all files
(RFC-15 could solve this)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)