[
https://issues.apache.org/jira/browse/IMPALA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang resolved IMPALA-13789.
-------------------------------------
Fix Version/s: Impala 4.6.0
Resolution: Fixed
Resolving this. Thank [~MikaelSmith] , [~boroknagyz] and [~daniel.becker] for
the review!
> Avoid holding lots of org.apache.hadoop.fs.Path objects in memory
> -----------------------------------------------------------------
>
> Key: IMPALA-13789
> URL: https://issues.apache.org/jira/browse/IMPALA-13789
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Fix For: Impala 4.6.0
>
> Attachments: histogram_path_objects.png, path_example.png
>
>
> When loading file metadata of a table, we create several Java Maps that using
> org.apache.hadoop.fs.Path as the key type, e.g. in
> [ParallelFileMetadataLoader|https://github.com/apache/impala/blob/cfeb57c128c7f514f3433a0399966f46a49a1a4a/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L74-L75]:
> {code:java}
> private final Map<Path, FileMetadataLoader> loaders_;
> private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
> Keeping these Path objects in memory is expensive as there are as many of
> them as the number of partitions.
> The following histogram shows that 4.3M of such Path objects takes 3GB in
> memory:
> !histogram_path_objects.png|width=737,height=463!
> Here is an example Path object which takes 704 bytes. The actual partition
> location string just takes 160 bytes. The other space are wasted by fields of
> java.net.URI:
> !path_example.png|width=818,height=287!
> We can use the String of the partition location as the key type and only
> create Path objects when loading that partition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)