[
https://issues.apache.org/jira/browse/IMPALA-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-3173:
----------------------------------
Priority: Major (was: Critical)
> Reduce catalog's memory footprint
> ---------------------------------
>
> Key: IMPALA-3173
> URL: https://issues.apache.org/jira/browse/IMPALA-3173
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 2.2.4
> Reporter: Dimitris Tsirogiannis
> Priority: Major
> Labels: catalog-server, performance, usability
>
> An initial analysis of catalog's heap dumps shows that we can probably reduce
> it's memory footprint by: a) avoid storing redundant information about
> catalog entities such as partitions, b) using more compressed data
> structures.
> Currently, for a table with 2 int columns and 1 int partition column and
> without incremental stats, we use:
> * *~930B* per partition out of which ~500B are used on hmsParameters_
> (<String, String>Map), ~190B on cachedMsPartitionDescriptor_, and ~200B
> (depending on path) on location.
> * *~800B* per file descriptor out of which ~530B go to file_blocks and the
> rest are used for storing the file_name.
> * Every HdfsTable also uses two maps that replicate partition locations and
> file names (e.g. perPartitionFileDescMap_ and nameToPartitionMap_).
> A table like that with 100,000 partitions and 10 files per partition requires
> 1GB and 1.4GB of memory w and w/o incremental stats, respectively.
> This is a parent JIRA of IMPALA-2840.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]