Tianyi Wang resolved IMPALA-5990.
Fix Version/s: Impala 2.12.0
IMPALA-5990: End-to-end compression of metadata
Currently the catalog data is compressed in the statestore, but
uncompressed when passed between FE and BE. It results in a ~2GB limit
on the metadata. IMPALA-3499 introduced a workaround in the impalad but
there isn't one in the catalogd. This patch aims to increase the size
limit for statestore updates, reduce the copying of the metadata and
reduce the memory footprint. With this patch, the catalog objects are
passed and (de)compressed between FE and BE one at a time. The new
- A single catalog object cannot be larger than ~2GB.
- A statestore catalog update cannot be larger than ~4GB. It is
compressed size if FLAGS_compact_catalog_topic is true.
The behavior of the catalog op executer is not changed. The data is not
compressed and the size limit is still 2GB.
Testing: Ran existing tests. A test for compressing and decompressing
catalog objects is added. Manually tested with a 1.95GB catalog object
and a 3.90 GB uncompressed statestore update.
Reviewed-by: Tianyi Wang <tw...@cloudera.com>
Tested-by: Impala Public Jenkins
> End-to-end compression of metadata
> Key: IMPALA-5990
> URL: https://issues.apache.org/jira/browse/IMPALA-5990
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog, Frontend
> Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
> Reporter: Alexander Behm
> Assignee: Tianyi Wang
> Priority: Critical
> Fix For: Impala 2.12.0
> The metadata of large tables can become quite big making it costly to hold in
> the statestore and disseminate to coordinator impalads. The metadata can even
> get so big that fundamental limits like the JVM 2GB array size and the Thrift
> 4GB are hit and lead to downtime.
> For reducing the statestore metadata topic size we have an existing
> "compact_catalog_topic" flag which LZ4 compresses the metadata payload for
> the C++ codepaths catalogd->statestore and statestore->impalad.
> Unfortunately, the metadata is not compressed in the same way during the
> FE->BE transition on the catalogd and the BE->FE transition on the impalad.
> The goal of this change is to enable end-to-end compression for the full path
> of metadata dissemination. The existing code paths also need significant
> cleanup/streamlining. Ideally, the new code should provide consistent size
> limits everywhere.
This message was sent by Atlassian JIRA