[ 
https://issues.apache.org/jira/browse/IMPALA-12076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772831#comment-17772831
 ] 

ASF subversion and git services commented on IMPALA-12076:
----------------------------------------------------------

Commit 5cc358d7ca9d746c8cf063e442b42d7d94bc0e1e in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5cc358d7c ]

IMPALA-12076: Use ZSTD interfaces with reusable context

For repeated compression/decompression, ZSTD recommends
constructing a context once via ZSTD_createCCtx()/ZSTD_createDCtx()
and using the set of interfaces that passes in the context explicitly
to avoid constructing the context on each call.

This follows the recommendation and allocates the ZSTD context once for
each compressor / decompressor and reuses it for the lifetime of the
compressor / decompressor.

This gets a minor speedup for small-scale ZSTD TPC-H:
+----------+------------------------+---------+------------+------------+----------------+
| Workload | File Format            | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+----------+------------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / zstd / block | 3.55    | -1.40%     | 2.52       | 
-1.63%         |
+----------+------------------------+---------+------------+------------+----------------+

Testing:
 - Ran core job
 - Ran a perf-AB-test job

Change-Id: I5010a56bf8202ccb3f1710425002f81587fd412b
Reviewed-on: http://gerrit.cloudera.org:8080/19773
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Potential performance improvement using ZSTD's ZSTD_decompressDCtx interface
> ----------------------------------------------------------------------------
>
>                 Key: IMPALA-12076
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12076
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 4.3.0
>            Reporter: Joe McDonnell
>            Priority: Major
>
> In ORC-639, they note that ZSTD's simple interface initializes the context on 
> each call to ZSTD_decompress(). When calling ZSTD_decompress() many times, it 
> is better to allocate the context once and use the ZSTD_decompressDCtx() 
> interface to avoid the repeated initialization.
> The ZSTD code mentions that here:
>  
> {noformat}
> /*= Decompression context
>  *  When decompressing many times,
>  *  it is recommended to allocate a context only once,
>  *  and re-use it for each successive compression operation.
>  *  This will make workload friendlier for system's memory.
>  *  Use one context per thread for parallel execution. */
> typedef struct ZSTD_DCtx_s ZSTD_DCtx;{noformat}
> We should investigate using this for decompress.h/.cc's 
> ZstandardDecompressor. We already do that for the streaming decompression 
> mode, but this should also apply to block decompression. Something similar is 
> possible for compression as well.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to