Quanlong Huang created IMPALA-13789:
---------------------------------------

             Summary: Avoid holding lots of org.apache.hadoop.fs.Path objects 
in memory
                 Key: IMPALA-13789
                 URL: https://issues.apache.org/jira/browse/IMPALA-13789
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang
         Attachments: histogram_path_objects.png

When loading file metadata of a table, we create several Java Maps that using 
org.apache.hadoop.fs.Path as the key type, e.g. in ParallelFileMetadataLoader:
{code:java}
  private final Map<Path, FileMetadataLoader> loaders_;
  private final Map<Path, List<HdfsPartition.Builder>> partsByPath_;{code}
Keeping these Path objects in memory is expensive as there are as many of them 
as the number of partitions.

The following histogram shows that 4.3M of such Path objects takes 3GB in 
memory:
!histogram_path_objects.png!

We can use the String of the partition location as the key type and only create 
Path objects when loading that partition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to