[ 
https://issues.apache.org/jira/browse/IMPALA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-13789.
-------------------------------------
    Fix Version/s: Impala 4.6.0
       Resolution: Fixed

Resolving this. Thank [~MikaelSmith] , [~boroknagyz] and [~daniel.becker] for 
the review!

> Avoid holding lots of org.apache.hadoop.fs.Path objects in memory
> -----------------------------------------------------------------
>
>                 Key: IMPALA-13789
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13789
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>             Fix For: Impala 4.6.0
>
>         Attachments: histogram_path_objects.png, path_example.png
>
>
> When loading file metadata of a table, we create several Java Maps that using 
> org.apache.hadoop.fs.Path as the key type, e.g. in 
> [ParallelFileMetadataLoader|https://github.com/apache/impala/blob/cfeb57c128c7f514f3433a0399966f46a49a1a4a/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L74-L75]:
> {code:java}
>   private final Map<Path, FileMetadataLoader> loaders_;
>   private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
> Keeping these Path objects in memory is expensive as there are as many of 
> them as the number of partitions.
> The following histogram shows that 4.3M of such Path objects takes 3GB in 
> memory:
> !histogram_path_objects.png|width=737,height=463!
> Here is an example Path object which takes 704 bytes. The actual partition 
> location string just takes 160 bytes. The other space are wasted by fields of 
> java.net.URI:
> !path_example.png|width=818,height=287!
> We can use the String of the partition location as the key type and only 
> create Path objects when loading that partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to