[
https://issues.apache.org/jira/browse/SPARK-39763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yeachan Park updated SPARK-39763:
---------------------------------
Description:
Hi all,
While transitioning from the default snappy compression to zstd, we noticed a
substantial increase in executor memory whilst *reading* and applying
transformations on *zstd* compressed parquet files.
Memory footprint increased increased 3 fold in some cases, compared to reading
and applying the same transformations on a parquet file compressed with snappy.
This behaviour only occurs when reading zstd compressed parquet files. Writing
a zstd parquet file does not result in this behaviour.
To reproduce:
# Set "spark.sql.parquet.compression.codec" to zstd
# Write some parquet files, the compression will default to zstd after setting
the option above
# Read the compressed zstd file and run some transformations. Compare the
memory usage of the executor vs running the same transformation on a parquet
file with snappy compression.
was:
Hi all,
While transitioning from the default snappy compression to zstd, we noticed a
substantial increase in executor memory whilst *reading* and applying
transformations on *zstd* compressed parquet files.
Memory footprint increased increased nearly 3 fold in some cases, compared to
reading and applying the same transformations on a parquet file compressed with
snappy.
This behaviour only occurs when reading zstd compressed parquet files. Writing
a zstd parquet file does not result in this behaviour.
To reproduce:
# Set "spark.sql.parquet.compression.codec" to zstd
# Write some parquet files, the compression will default to zstd after setting
the option above
# Read the compressed zstd file and run some transformations. Compare the
memory usage of the executor vs running the same transformation on a parquet
file with snappy compression.
> Executor memory footprint substantially increases while reading zstd
> compressed parquet files
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-39763
> URL: https://issues.apache.org/jira/browse/SPARK-39763
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.2.0
> Reporter: Yeachan Park
> Priority: Minor
>
> Hi all,
>
> While transitioning from the default snappy compression to zstd, we noticed a
> substantial increase in executor memory whilst *reading* and applying
> transformations on *zstd* compressed parquet files.
> Memory footprint increased increased 3 fold in some cases, compared to
> reading and applying the same transformations on a parquet file compressed
> with snappy.
> This behaviour only occurs when reading zstd compressed parquet files.
> Writing a zstd parquet file does not result in this behaviour.
> To reproduce:
> # Set "spark.sql.parquet.compression.codec" to zstd
> # Write some parquet files, the compression will default to zstd after
> setting the option above
> # Read the compressed zstd file and run some transformations. Compare the
> memory usage of the executor vs running the same transformation on a parquet
> file with snappy compression.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]