Quanlong Huang created IMPALA-13789:
---------------------------------------
Summary: Avoid holding lots of org.apache.hadoop.fs.Path objects
in memory
Key: IMPALA-13789
URL: https://issues.apache.org/jira/browse/IMPALA-13789
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang
Attachments: histogram_path_objects.png
When loading file metadata of a table, we create several Java Maps that using
org.apache.hadoop.fs.Path as the key type, e.g. in ParallelFileMetadataLoader:
{code:java}
private final Map<Path, FileMetadataLoader> loaders_;
private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
Keeping these Path objects in memory is expensive as there are as many of them
as the number of partitions.
The following histogram shows that 4.3M of such Path objects takes 3GB in
memory:
!histogram_path_objects.png!
We can use the String of the partition location as the key type and only create
Path objects when loading that partition.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)