Joe McDonnell created IMPALA-12076:
--------------------------------------
Summary: Potential performance improvement using ZSTD's
ZSTD_decompressDCtx interface
Key: IMPALA-12076
URL: https://issues.apache.org/jira/browse/IMPALA-12076
Project: IMPALA
Issue Type: Improvement
Components: Backend
Affects Versions: Impala 4.3.0
Reporter: Joe McDonnell
In ORC-639, they note that ZSTD's simple interface initializes the context on
each call to ZSTD_decompress(). When calling ZSTD_decompress() many times, it
is better to allocate the context once and use the ZSTD_decompressDCtx()
interface to avoid the repeated initialization.
The ZSTD code mentions that here:
{noformat}
/*= Decompression context
* When decompressing many times,
* it is recommended to allocate a context only once,
* and re-use it for each successive compression operation.
* This will make workload friendlier for system's memory.
* Use one context per thread for parallel execution. */
typedef struct ZSTD_DCtx_s ZSTD_DCtx;{noformat}
We should investigate using this for decompress.h/.cc's ZstandardDecompressor.
We already do that for the streaming decompression mode, but this should also
apply to block decompression. Something similar is possible for compression as
well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]