[ 
https://issues.apache.org/jira/browse/IMPALA-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653850#comment-16653850
 ] 

ASF subversion and git services commented on IMPALA-7708:
---------------------------------------------------------

Commit 3b6c0f6296e807b25f3e40bd614b2571f4f01d48 in impala's branch 
refs/heads/master from Bharath Vissapragada
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=3b6c0f6 ]

IMPALA-7708: Switch to faster deflater compression level for incr stats

On a table with 3000 partitions and ~150 columns, we noticed that the
BEST_SPEED deflater strategy is ~8x faster with ~4% compression ratio
penalty. Given these results, this patch switches the default to
BEST_SPEED from BEST_COMPRESSION.

Change-Id: Ife688aca3aed0e1e8af26c8348b850175d84b4ad
Reviewed-on: http://gerrit.cloudera.org:8080/11685
Reviewed-by: Philip Zeyliger <[email protected]>
Reviewed-by: Vuk Ercegovac <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Switch to faster compression strategy for incremental stats
> -----------------------------------------------------------
>
>                 Key: IMPALA-7708
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7708
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 3.1.0
>            Reporter: bharath v
>            Assignee: bharath v
>            Priority: Major
>
> Currently we set the Deflater mode to BEST_COMPRESSION by default.
> {noformat}
> public static byte[] deflateCompress(byte[] input) {
>     if (input == null) return null;
>     ByteArrayOutputStream bos = new ByteArrayOutputStream(input.length);
>     // TODO: Benchmark other compression levels.
>     DeflaterOutputStream stream =
>         new DeflaterOutputStream(bos, new 
> Deflater(Deflater.BEST_COMPRESSION));
> {noformat}
> In some experiments, we noticed that the fastest compression mode 
> (BEST_SPEED) performs ~8x faster with only ~4% compression ratio penalty. 
> Here are some results on a real world table with 3000 partitions with 
> incremental stats.
>  
> | |Time taken for serialization (seconds)|OutputBytes size (MB)|
> |Gzip best compression|92|194|
> |Gzip fastest compression|11|212|
> |Gzip default compression|57|195|
> |No compression|5|452|
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to