[
https://issues.apache.org/jira/browse/SPARK-48456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039865#comment-18039865
]
David Milicevic commented on SPARK-48456:
-----------------------------------------
Same as with SPARK-48457 (Testing and operational readiness), I'm marking this
item as done (cleanup before Spark 4.1 release).
We did benchmarks, I'm not sure how much of those are relevant to OSS.
What's important for sure is the benchmark against equivalent code written in
PySpark, for which on average we had a bit lower perf (~10%), but that was
expected.
To address these issues, we have produced a set of work items related to how
local variables, simple conditions, and similar are to be handled and those
should address the gap.
Other than that, before we figure out a way to introduce new concepts, like for
example plan caching, or similar, I don't think we need more benchmarks.
Tagging [~milan.dankovic] here as well, to fill-in if any details are missing.
[~cloud_fan] please assign this one to Milan as well.
> [M1] Performance benchmark
> --------------------------
>
> Key: SPARK-48456
> URL: https://issues.apache.org/jira/browse/SPARK-48456
> Project: Spark
> Issue Type: Sub-task
> Components: Spark Core
> Affects Versions: 4.0.0
> Reporter: David Milicevic
> Priority: Major
>
> Performance parity is officially an M2 requirement, but by the end of M0 I
> think we should start doing some perf benchmarks to figure out where do we
> stand in the beginning and if we need to change something right from the
> start before we get to work on a more complex stuff.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]