Yeachan Park created SPARK-39763:
------------------------------------
Summary: Executor memory footprint substantially increases while
reading zstd compressed parquet files
Key: SPARK-39763
URL: https://issues.apache.org/jira/browse/SPARK-39763
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.2.0
Reporter: Yeachan Park
Hi all,
While transitioning from the default snappy compression to zstd, we noticed a
substantial increase in executor memory whilst reading and processing zstd
compressed parquet files.
Memory footprint increased increased nearly 3 fold in some cases.
Reading and processing files in snappy and writing to zstd did not result in
this behaviour.
To reproduce:
# Set "spark.sql.parquet.compression.codec" to zstd
# Write some parquet files, the compression will default to zstd after setting
the option above
# Read the compressed zstd file and run some transformations. Compare the
memory usage of the executor vs running the same transformation on a parquet
file with snappy compression.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]