[ 
https://issues.apache.org/jira/browse/SPARK-39763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeachan Park updated SPARK-39763:
---------------------------------
    Description: 
Hi all,

 

While transitioning from the default snappy compression to zstd, we noticed a 
substantial increase in executor memory whilst *reading* and applying 
transformations on *zstd* compressed parquet files.

Memory footprint increased increased 3 fold in some cases, compared to reading 
and applying the same transformations on a parquet file compressed with snappy.

This behaviour only occurs when reading zstd compressed parquet files. Writing 
a zstd parquet file does not result in this behaviour.

To reproduce:
 # Set "spark.sql.parquet.compression.codec" to zstd
 # Write some parquet files, the compression will default to zstd after setting 
the option above
 # Read the compressed zstd file and run some transformations. Compare the 
memory usage of the executor vs running the same transformation on a parquet 
file with snappy compression.

  was:
Hi all,

 

While transitioning from the default snappy compression to zstd, we noticed a 
substantial increase in executor memory whilst *reading* and applying 
transformations on *zstd* compressed parquet files.

Memory footprint increased increased nearly 3 fold in some cases, compared to 
reading and applying the same transformations on a parquet file compressed with 
snappy.

This behaviour only occurs when reading zstd compressed parquet files. Writing 
a zstd parquet file does not result in this behaviour.

To reproduce:
 # Set "spark.sql.parquet.compression.codec" to zstd
 # Write some parquet files, the compression will default to zstd after setting 
the option above
 # Read the compressed zstd file and run some transformations. Compare the 
memory usage of the executor vs running the same transformation on a parquet 
file with snappy compression.


> Executor memory footprint substantially increases while reading zstd 
> compressed parquet files
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39763
>                 URL: https://issues.apache.org/jira/browse/SPARK-39763
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.0
>            Reporter: Yeachan Park
>            Priority: Minor
>
> Hi all,
>  
> While transitioning from the default snappy compression to zstd, we noticed a 
> substantial increase in executor memory whilst *reading* and applying 
> transformations on *zstd* compressed parquet files.
> Memory footprint increased increased 3 fold in some cases, compared to 
> reading and applying the same transformations on a parquet file compressed 
> with snappy.
> This behaviour only occurs when reading zstd compressed parquet files. 
> Writing a zstd parquet file does not result in this behaviour.
> To reproduce:
>  # Set "spark.sql.parquet.compression.codec" to zstd
>  # Write some parquet files, the compression will default to zstd after 
> setting the option above
>  # Read the compressed zstd file and run some transformations. Compare the 
> memory usage of the executor vs running the same transformation on a parquet 
> file with snappy compression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to