[
https://issues.apache.org/jira/browse/HUDI-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-712:
--------------------------------
Labels: pull-request-available (was: )
> Improve exporter performance and memory usage
> ---------------------------------------------
>
> Key: HUDI-712
> URL: https://issues.apache.org/jira/browse/HUDI-712
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Utilities
> Reporter: Raymond Xu
> Priority: Minor
> Labels: pull-request-available
> Fix For: 0.12.2
>
>
> [https://github.com/apache/incubator-hudi/blob/99b7e9eb9ef8827c1e06b7e8621b6be6403b061e/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java#L103-L107]
> The way the data file list for export is collected can be improved due to
> * not parallelized among partitions
> * the list can be too large
> * listing partition to get the latest files requires scanning all files
> (RFC-15 could solve this)
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)