[ 
https://issues.apache.org/jira/browse/IMPALA-13789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17931322#comment-17931322
 ] 

ASF subversion and git services commented on IMPALA-13789:
----------------------------------------------------------

Commit 8c51f72e10388b0130811a9bfb594b51099b6bb6 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8c51f72e1 ]

IMPALA-13789: Defer creating Path objects in loading file metadata

When loading file metadata of a HdfsTable, we create a
org.apache.hadoop.fs.Path object for each partition dir before actually
loading its file metadata. These Path objects have a large memory
footprint as the underlying java.net.URI objects have extra fields
extracted from the location string. E.g. a location string that takes
160 bytes has a corresponding Path object that takes 704 bytes. See more
details of this example in the JIRA description.

These Path objects are used as the keys of several Maps for file
metadata loaders. We create them before actually loading the metadata.
This patch fixes this by using the location strings as the keys and only
creating the Path objects when we start loading file metadata of the
partition.

Tests:
 - Ran CORE tests
 - Analyzed the heap dump during file metadata loading, didn't see lots
   of Path objects anymore.

Change-Id: I6ec1fc932eaf7c833ef6ee6cdb08bba235e38271
Reviewed-on: http://gerrit.cloudera.org:8080/22535
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> Avoid holding lots of org.apache.hadoop.fs.Path objects in memory
> -----------------------------------------------------------------
>
>                 Key: IMPALA-13789
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13789
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: histogram_path_objects.png, path_example.png
>
>
> When loading file metadata of a table, we create several Java Maps that using 
> org.apache.hadoop.fs.Path as the key type, e.g. in 
> [ParallelFileMetadataLoader|https://github.com/apache/impala/blob/cfeb57c128c7f514f3433a0399966f46a49a1a4a/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L74-L75]:
> {code:java}
>   private final Map<Path, FileMetadataLoader> loaders_;
>   private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
> Keeping these Path objects in memory is expensive as there are as many of 
> them as the number of partitions.
> The following histogram shows that 4.3M of such Path objects takes 3GB in 
> memory:
> !histogram_path_objects.png|width=737,height=463!
> Here is an example Path object which takes 704 bytes. The actual partition 
> location string just takes 160 bytes. The other space are wasted by fields of 
> java.net.URI:
> !path_example.png|width=818,height=287!
> We can use the String of the partition location as the key type and only 
> create Path objects when loading that partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to