Raymond Xu created HUDI-712:
-------------------------------

             Summary: Improve exporter performance and memory usage
                 Key: HUDI-712
                 URL: https://issues.apache.org/jira/browse/HUDI-712
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
          Components: Utilities
            Reporter: Raymond Xu


[https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L103-L107]

The way the data file list for export is collected can be improved due to
 * not parallelized among partitions
 * the list can be too large
 * listing partition to get the latest files requires scanning all files 
(RFC-15 could solve this)

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to