Deprecating and banning `spark.databricks.*` config from Apache Spark repository

Dongjoon Hyun Mon, 17 Feb 2025 14:02:22 -0800

Hi, All.

I'd like to highlight this discussion because this is more important and
tricky in a way.


As already mentioned in the mailing list and PRs, there was an obvious
mistake
which missed an improper configuration name, `spark.databricks.*`.

https://github.com/apache/spark/blob/a6f220d951742f4074b37772485ee0ec7a774e7d/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L3424

`spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan`

In fact, Apache Spark committers have been preventing this repetitive
mistake
pattern during the review stages successfully until we slip the following
backportings
at Apache Spark 3.5.4.

https://github.com/apache/spark/pull/45649
https://github.com/apache/spark/pull/48149
https://github.com/apache/spark/pull/49121

At this point of writing, `spark.databricks.*` was removed successfully
from `master`
and `branch-4.0` and a new ScalaStyle rule was added to protect Apache
Spark repository
from future mistakes.

SPARK-51172 Rename to
spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan
SPARK-51173 Add `configName` Scalastyle rule

What I proposed is to release Apache Spark 3.5.5 next week with the
deprecation
in order to make Apache Spark 4.0 be free of `spark.databricks.*`
configuration.

Apache Spark 3.5.5 (2025 February, with deprecation warning with
alternative)
Apache Spark 4.0.0 (2025 March, without `spark.databricks.*` config)

In addition, I'd like to volunteer as a release manager of Apache Spark
3.5.5
for a swift release. WDYT?

FYI, `branch-3.5` has 37 patches currently.

$ git log --oneline v3.5.4..HEAD | wc -l
      37

Best Regards,
Dongjoon.

Deprecating and banning `spark.databricks.*` config from Apache Spark repository

Reply via email to