[
https://issues.apache.org/jira/browse/SPARK-35744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steven Aerts updated SPARK-35744:
---------------------------------
Description:
Creating this bug to let you know that when we tested out spark 3.2.0 we saw a
significant performance degradation where our code was handling Avro Specific
Record objects. This slowed down some of our jobs with a factor 4.
Spark 3.2.0 upsteps the avro version from 1.8.2 to 1.10.2.
The degradation was caused by a change introduced in avro 1.9.0. This change
degrades performance when creating avro specific records in certain classloader
topologies, like the ones used in spark.
We notified and [proposed|https://github.com/apache/avro/pull/1253] a simple
fix upstream in the avro project. (Links contain more details)
It is unclear for us how many other projects are using avro specific records in
a spark context and will be impacted by this degradation.
Feel free to close this issue if you think this issue is too much of a corner
case.
was:
Creating this bug to let you know that when we tested out spark 3.2.0 we saw a
significant performance degradation where our code was handling Avro Specific
Record objects. This slowed down some of our jobs with a factor 4.
Spark 3.2.0 upsteps the avro version from 2.8.2 to 2.10.2.
The degradation was caused by a change introduced in avro 2.9.0. This change
degrades performance when creating avro specific records in certain classloader
topologies, like the ones used in spark.
We notified and [proposed|https://github.com/apache/avro/pull/1253] a simple
fix upstream in the avro project. (Links contain more details)
It is unclear for us how many other projects are using avro specific records in
a spark context and will be impacted by this degradation.
Feel free to close this issue if you think this issue is too much of a corner
case.
> Performance degradation in avro SpecificRecordBuilders
> ------------------------------------------------------
>
> Key: SPARK-35744
> URL: https://issues.apache.org/jira/browse/SPARK-35744
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.2.0
> Reporter: Steven Aerts
> Priority: Minor
>
> Creating this bug to let you know that when we tested out spark 3.2.0 we saw
> a significant performance degradation where our code was handling Avro
> Specific Record objects. This slowed down some of our jobs with a factor 4.
> Spark 3.2.0 upsteps the avro version from 1.8.2 to 1.10.2.
> The degradation was caused by a change introduced in avro 1.9.0. This change
> degrades performance when creating avro specific records in certain
> classloader topologies, like the ones used in spark.
> We notified and [proposed|https://github.com/apache/avro/pull/1253] a simple
> fix upstream in the avro project. (Links contain more details)
> It is unclear for us how many other projects are using avro specific records
> in a spark context and will be impacted by this degradation.
> Feel free to close this issue if you think this issue is too much of a
> corner case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]