Michael Taranov created SPARK-38703:
---------------------------------------
Summary: High GC and memory footprint after switch to ZSTD
Key: SPARK-38703
URL: https://issues.apache.org/jira/browse/SPARK-38703
Project: Spark
Issue Type: Question
Components: Input/Output
Affects Versions: 3.1.2
Reporter: Michael Taranov
Hi All,
We started to switch our Spark pipelines to read parquet with ZSTD compression.
After the switch we see that memory footprint is much larger than previously
with SNAPPY.
Additionally GC stats of the jobs are much higher comparing to SNAPPY with the
same workload as previously.
Is there any configurations that may be relevant to read path, that may help in
such cases ?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]