bharath v created IMPALA-7708:
---------------------------------
Summary: Switch to faster compression strategy for incremental
stats
Key: IMPALA-7708
URL: https://issues.apache.org/jira/browse/IMPALA-7708
Project: IMPALA
Issue Type: Improvement
Components: Catalog
Affects Versions: Impala 3.1.0
Reporter: bharath v
Assignee: bharath v
Currently we set the Deflater mode to BEST_COMPRESSION by default.
{noformat}
public static byte[] deflateCompress(byte[] input) {
if (input == null) return null;
ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);
// TODO: Benchmark other compression levels.
DeflaterOutputStream stream =
new DeflaterOutputStream(bos, new Deflater(Deflater.BEST_COMPRESSION));
{noformat}
In some experiments, we noticed that the fastest compression mode (BEST_SPEED)
performs ~8x faster with only ~4% compression ratio penalty.
Here are some results on a real world table with 3000 partitions with
incremental stats.
| |Time taken for serialization (seconds)|OutputBytes size (MB)|
|Gzip best compression|92|194|
|Gzip fastest compression|11|212|
|Gzip default compression|57|195|
|No compression|5|452|
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)