[
https://issues.apache.org/jira/browse/IMPALA-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653850#comment-16653850
]
ASF subversion and git services commented on IMPALA-7708:
---------------------------------------------------------
Commit 3b6c0f6296e807b25f3e40bd614b2571f4f01d48 in impala's branch
refs/heads/master from Bharath Vissapragada
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=3b6c0f6 ]
IMPALA-7708: Switch to faster deflater compression level for incr stats
On a table with 3000 partitions and ~150 columns, we noticed that the
BEST_SPEED deflater strategy is ~8x faster with ~4% compression ratio
penalty. Given these results, this patch switches the default to
BEST_SPEED from BEST_COMPRESSION.
Change-Id: Ife688aca3aed0e1e8af26c8348b850175d84b4ad
Reviewed-on: http://gerrit.cloudera.org:8080/11685
Reviewed-by: Philip Zeyliger <[email protected]>
Reviewed-by: Vuk Ercegovac <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Switch to faster compression strategy for incremental stats
> -----------------------------------------------------------
>
> Key: IMPALA-7708
> URL: https://issues.apache.org/jira/browse/IMPALA-7708
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 3.1.0
> Reporter: bharath v
> Assignee: bharath v
> Priority: Major
>
> Currently we set the Deflater mode to BEST_COMPRESSION by default.
> {noformat}
> public static byte[] deflateCompress(byte[] input) {
> if (input == null) return null;
> ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);
> // TODO: Benchmark other compression levels.
> DeflaterOutputStream stream =
> new DeflaterOutputStream(bos, new
> Deflater(Deflater.BEST_COMPRESSION));
> {noformat}
> In some experiments, we noticed that the fastest compression mode
> (BEST_SPEED) performs ~8x faster with only ~4% compression ratio penalty.
> Here are some results on a real world table with 3000 partitions with
> incremental stats.
>
> | |Time taken for serialization (seconds)|OutputBytes size (MB)|
> |Gzip best compression|92|194|
> |Gzip fastest compression|11|212|
> |Gzip default compression|57|195|
> |No compression|5|452|
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]