Re: [PR] Add Spark 4.0.0 release notes [spark-website]

via GitHub Fri, 23 May 2025 09:28:13 -0700


bjornjorgensen commented on code in PR #608:
URL: https://github.com/apache/spark-website/pull/608#discussion_r2104985282



##########
releases/_posts/2025-05-23-spark-release-4-0-0.md:
##########
@@ -0,0 +1,694 @@
+---
+layout: post
+title: Spark Release 4.0.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+
+Apache Spark 4.0.0 marks a significant milestone as the inaugural release in 
the 4.x series, embodying the collective effort of the vibrant open-source 
community. This release is a testament to tremendous collaboration, resolving 
over 5100 tickets with contributions from more than 390 individuals.
+
+Spark Connect continues its rapid advancement, delivering substantial 
improvements:
+- A new lightweight Python client 
([pyspark-client](https://pypi.org/project/pyspark-client)) at just 1.5 MB.
+- Full API compatibility for the Java client.
+- Greatly expanded API coverage.
+- ML on Spark Connect.
+- A new client implementation for 
[Swift](https://github.com/apache/spark-connect-swift).
+
+Spark SQL is significantly enriched with powerful new features designed to 
boost expressiveness and versatility for SQL workloads, such as VARIANT data 
type support, SQL user-defined functions, session variables, pipe syntax, and 
string collation.
+
+PySpark sees continuous dedication to both its functional breadth and the 
overall developer experience, bringing a native plotting API, a new Python Data 
Source API, support for Python UDTFs, and unified profiling for PySpark UDFs, 
alongside numerous other enhancements.
+
+Structured Streaming evolves with key additions that provide greater control 
and ease of debugging, notably the introduction of the Arbitrary State API v2 
for more flexible state management and the State Data Source for easier 
debugging.
+
+To download Apache Spark 4.0.0, please visit the 
[downloads](https://spark.apache.org/downloads.html) page. For [detailed 
changes](https://issues.apache.org/jira/projects/SPARK/versions/12353359), you 
can consult JIRA. We have also curated a list of high-level changes here, 
grouped by major modules.
+
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+
+### Core and Spark SQL Highlights
+
+- [[SPARK-45314]](https://issues.apache.org/jira/browse/SPARK-45314) Drop 
Scala 2.12 and make Scala 2.13 the default
+- [[SPARK-45315]](https://issues.apache.org/jira/browse/SPARK-45315) Drop JDK 
8/11 and make JDK 17 the default
+- [[SPARK-45923]](https://issues.apache.org/jira/browse/SPARK-45923) Spark 
Kubernetes Operator
+- [[SPARK-45869]](https://issues.apache.org/jira/browse/SPARK-45869) Revisit 
and improve Spark Standalone Cluster
+- [[SPARK-42849]](https://issues.apache.org/jira/browse/SPARK-42849) Session 
Variables
+- [[SPARK-44444]](https://issues.apache.org/jira/browse/SPARK-44444) Use ANSI 
SQL mode by default
+- [[SPARK-46057]](https://issues.apache.org/jira/browse/SPARK-46057) Support 
SQL user-defined functions
+- [[SPARK-45827]](https://issues.apache.org/jira/browse/SPARK-45827) Add 
VARIANT data type
+- [[SPARK-49555]](https://issues.apache.org/jira/browse/SPARK-49555) SQL Pipe 
syntax
+- [[SPARK-46830]](https://issues.apache.org/jira/browse/SPARK-46830) String 
Collation support
+- [[SPARK-44265]](https://issues.apache.org/jira/browse/SPARK-44265) Built-in 
XML data source support
+
+
+### Spark Core
+
+- [[SPARK-49524]](https://issues.apache.org/jira/browse/SPARK-49524) Improve 
K8s support
+- [[SPARK-47240]](https://issues.apache.org/jira/browse/SPARK-47240) SPIP: 
Structured Logging Framework for Apache Spark
+- [[SPARK-44893]](https://issues.apache.org/jira/browse/SPARK-44893) 
`ThreadInfo` improvements for monitoring APIs
+- [[SPARK-46861]](https://issues.apache.org/jira/browse/SPARK-46861) Avoid 
Deadlock in DAGScheduler
+- [[SPARK-47764]](https://issues.apache.org/jira/browse/SPARK-47764) Cleanup 
shuffle dependencies based on `ShuffleCleanupMode`
+- [[SPARK-49459]](https://issues.apache.org/jira/browse/SPARK-49459) Support 
CRC32C for Shuffle Checksum
+- [[SPARK-46383]](https://issues.apache.org/jira/browse/SPARK-46383) Reduce 
Driver Heap Usage by shortening `TaskInfo.accumulables()` lifespan
+- [[SPARK-45527]](https://issues.apache.org/jira/browse/SPARK-45527) Use 
fraction-based resource calculation
+- [[SPARK-47172]](https://issues.apache.org/jira/browse/SPARK-47172) Add 
AES-GCM as an optional AES cipher mode for RPC encryption
+- [[SPARK-47448]](https://issues.apache.org/jira/browse/SPARK-47448) Enable 
`spark.shuffle.service.removeShuffle` by default
+- [[SPARK-47674]](https://issues.apache.org/jira/browse/SPARK-47674) Enable 
`spark.metrics.appStatusSource.enabled` by default
+- [[SPARK-48063]](https://issues.apache.org/jira/browse/SPARK-48063) Enable 
`spark.stage.ignoreDecommissionFetchFailure` by default
+- [[SPARK-48268]](https://issues.apache.org/jira/browse/SPARK-48268) Add 
`spark.checkpoint.dir` config
+- [[SPARK-48292]](https://issues.apache.org/jira/browse/SPARK-48292) Revert 
SPARK-39195 (OutputCommitCoordinator) to fix duplication issues
+- [[SPARK-48518]](https://issues.apache.org/jira/browse/SPARK-48518) Make LZF 
compression run in parallel
+- [[SPARK-46132]](https://issues.apache.org/jira/browse/SPARK-46132) Support 
key password for JKS keys for RPC SSL
+- [[SPARK-46456]](https://issues.apache.org/jira/browse/SPARK-46456) Add 
`spark.ui.jettyStopTimeout` to set Jetty server stop timeout
+- [[SPARK-46256]](https://issues.apache.org/jira/browse/SPARK-46256) Parallel 
Compression Support for ZSTD
+- [[SPARK-45544]](https://issues.apache.org/jira/browse/SPARK-45544) Integrate 
SSL support into `TransportContext`
+- [[SPARK-45351]](https://issues.apache.org/jira/browse/SPARK-45351) Change 
`spark.shuffle.service.db.backend` default value to `ROCKSDB`
+- [[SPARK-44741]](https://issues.apache.org/jira/browse/SPARK-44741) Support 
regex-based `MetricFilter` in `StatsdSink`
+- [[SPARK-43987]](https://issues.apache.org/jira/browse/SPARK-43987) Separate 
`finalizeShuffleMerge` Processing to Dedicated Thread Pools
+- [[SPARK-45439]](https://issues.apache.org/jira/browse/SPARK-45439) Reduce 
memory usage of `LiveStageMetrics.accumIdsToMetricType`
+
+
+### Spark SQL
+
+#### Features
+
+- [[SPARK-50541]](https://issues.apache.org/jira/browse/SPARK-50541) Describe 
Table As JSON
+- [[SPARK-48031]](https://issues.apache.org/jira/browse/SPARK-48031) Support 
view schema evolution
+- [[SPARK-50883]](https://issues.apache.org/jira/browse/SPARK-50883) Support 
altering multiple columns in the same command
+- [[SPARK-47627]](https://issues.apache.org/jira/browse/SPARK-47627) Add `SQL 
MERGE` syntax to enable schema evolution
+- [[SPARK-47430]](https://issues.apache.org/jira/browse/SPARK-47430) Support 
`GROUP BY` for `MapType`
+- [[SPARK-49093]](https://issues.apache.org/jira/browse/SPARK-49093) `GROUP 
BY` with MapType nested inside complex type
+- [[SPARK-49098]](https://issues.apache.org/jira/browse/SPARK-49098) Add write 
options for `INSERT`
+- [[SPARK-49451]](https://issues.apache.org/jira/browse/SPARK-49451) Allow 
duplicate keys in `parse_json`
+- [[SPARK-46536]](https://issues.apache.org/jira/browse/SPARK-46536) Support 
`GROUP BY calendar_interval_type`
+- [[SPARK-46908]](https://issues.apache.org/jira/browse/SPARK-46908) Support 
star clause in `WHERE` clause
+- [[SPARK-36680]](https://issues.apache.org/jira/browse/SPARK-36680) Support 
dynamic table options via `WITH OPTIONS` syntax
+- [[SPARK-35553]](https://issues.apache.org/jira/browse/SPARK-35553) Improve 
correlated subqueries
+- [[SPARK-47492]](https://issues.apache.org/jira/browse/SPARK-47492) Widen 
whitespace rules in lexer to allow Unicode
+- [[SPARK-46246]](https://issues.apache.org/jira/browse/SPARK-46246) `EXECUTE 
IMMEDIATE`SQL support
+- [[SPARK-46207]](https://issues.apache.org/jira/browse/SPARK-46207) Support 
`MergeInto` in DataFrameWriterV2
+- [[SPARK-50129]](https://issues.apache.org/jira/browse/SPARK-50129) Add 
DataFrame APIs for subqueries
+- [[SPARK-50075]](https://issues.apache.org/jira/browse/SPARK-50075) DataFrame 
APIs for table-valued functions
+
+
+#### Functions
+
+- [[SPARK-52016]](https://issues.apache.org/jira/browse/SPARK-52016) New 
built-in functions in Spark 4.0
+- [[SPARK-44001]](https://issues.apache.org/jira/browse/SPARK-44001) Add 
option to allow unwrapping protobuf well-known wrapper types
+- [[SPARK-43427]](https://issues.apache.org/jira/browse/SPARK-43427) spark 
protobuf: allow upcasting unsigned integer types
+- [[SPARK-44983]](https://issues.apache.org/jira/browse/SPARK-44983) Convert 
`binary` to `string` by `to_char` for the formats: hex, base64, utf-8
+- [[SPARK-44868]](https://issues.apache.org/jira/browse/SPARK-44868) Convert 
`datetime` to `string` by `to_char`/`to_varchar`
+- [[SPARK-45796]](https://issues.apache.org/jira/browse/SPARK-45796) Support 
`MODE() WITHIN GROUP (ORDER BY col)`
+- [[SPARK-48658]](https://issues.apache.org/jira/browse/SPARK-48658) 
Encode/Decode functions report coding errors instead of mojibake
+- [[SPARK-45034]](https://issues.apache.org/jira/browse/SPARK-45034) Support 
deterministic mode function
+- [[SPARK-44778]](https://issues.apache.org/jira/browse/SPARK-44778) Add the 
alias `TIMEDIFF` for `TIMESTAMPDIFF`
+- [[SPARK-47497]](https://issues.apache.org/jira/browse/SPARK-47497) Make 
`to_csv` support arrays/maps/binary as pretty strings
+- [[SPARK-44840]](https://issues.apache.org/jira/browse/SPARK-44840) Make 
`array_insert()` 1-based for negative indexes
+
+#### Query optimization
+
+- [[SPARK-46946]](https://issues.apache.org/jira/browse/SPARK-46946) 
Supporting broadcast of multiple filtering keys in `DynamicPruning`
+- [[SPARK-48445]](https://issues.apache.org/jira/browse/SPARK-48445) Don’t 
inline UDFs with expansive children
+- [[SPARK-41413]](https://issues.apache.org/jira/browse/SPARK-41413) Avoid 
shuffle in Storage-Partitioned Join when partition keys mismatch, but 
expressions are compatible
+- [[SPARK-46941]](https://issues.apache.org/jira/browse/SPARK-46941) Prevent 
insertion of window group limit node with `SizeBasedWindowFunction`
+- [[SPARK-46707]](https://issues.apache.org/jira/browse/SPARK-46707) Add 
throwable field to expressions to improve predicate pushdown
+- [[SPARK-47511]](https://issues.apache.org/jira/browse/SPARK-47511) 
Canonicalize `WITH` expressions by reassigning IDs
+- [[SPARK-46502]](https://issues.apache.org/jira/browse/SPARK-46502) Support 
timestamp types in `UnwrapCastInBinaryComparison`
+- [[SPARK-46069]](https://issues.apache.org/jira/browse/SPARK-46069) Support 
unwrap timestamp type to date type
+- [[SPARK-46219]](https://issues.apache.org/jira/browse/SPARK-46219) Unwrap 
cast in join predicates
+- [[SPARK-45606]](https://issues.apache.org/jira/browse/SPARK-45606) Release 
restrictions on multi-layer runtime filter
+- [[SPARK-45909]](https://issues.apache.org/jira/browse/SPARK-45909) Remove 
`NumericType` cast if it can safely up-cast in `IsNotNull`
+
+#### Query execution
+
+- [[SPARK-45592]](https://issues.apache.org/jira/browse/SPARK-45592) 
Correctness issue in AQE with `InMemoryTableScanExec`
+- [[SPARK-50258]](https://issues.apache.org/jira/browse/SPARK-50258) Fix 
output column order changed issue after AQE
+- [[SPARK-46693]](https://issues.apache.org/jira/browse/SPARK-46693) Inject 
`LocalLimitExec` when matching `OffsetAndLimit` or `LimitAndOffset`
+- [[SPARK-48873]](https://issues.apache.org/jira/browse/SPARK-48873) Use 
`UnsafeRow` in JSON parser
+- [[SPARK-41471]](https://issues.apache.org/jira/browse/SPARK-41471) Reduce 
Spark shuffle when only one side of a join is `KeyGroupedPartitioning`
+- [[SPARK-45452]](https://issues.apache.org/jira/browse/SPARK-45452) Improve 
`InMemoryFileIndex` to use` FileSystem.listFiles` API
+- [[SPARK-48649]](https://issues.apache.org/jira/browse/SPARK-48649) Add 
`ignoreInvalidPartitionPaths` configs for skipping invalid partition paths
+- [[SPARK-45882]](https://issues.apache.org/jira/browse/SPARK-45882) 
`BroadcastHashJoinExec` propagate partitioning should respect 
CoalescedHashPartitioning
+
+
+### Spark Connectors
+
+#### Data Source V2 framework
+
+- [[SPARK-45784]](https://issues.apache.org/jira/browse/SPARK-45784) Introduce 
clustering mechanism to Spark
+- [[SPARK-50820]](https://issues.apache.org/jira/browse/SPARK-50820) DSv2: 
Conditional nullification of metadata columns in DML
+- [[SPARK-51938]](https://issues.apache.org/jira/browse/SPARK-51938) Improve 
Storage Partition Join
+- [[SPARK-50700]](https://issues.apache.org/jira/browse/SPARK-50700) 
`spark.sql.catalog.spark_catalog` supports builtin magic value
+- [[SPARK-48781]](https://issues.apache.org/jira/browse/SPARK-48781) Add 
Catalog APIs for loading stored procedures
+- [[SPARK-49246]](https://issues.apache.org/jira/browse/SPARK-49246) 
`TableCatalog#loadTable` should indicate if it's for writing
+- [[SPARK-45965]](https://issues.apache.org/jira/browse/SPARK-45965) Move DSv2 
partitioning expressions into functions.partitioning
+- [[SPARK-46272]](https://issues.apache.org/jira/browse/SPARK-46272) Support 
CTAS using DSv2 sources
+- [[SPARK-46043]](https://issues.apache.org/jira/browse/SPARK-46043) Support 
create table using DSv2 sources
+- [[SPARK-48668]](https://issues.apache.org/jira/browse/SPARK-48668) Support 
`ALTER NAMESPACE ... UNSET PROPERTIES` in v2
+- [[SPARK-46442]](https://issues.apache.org/jira/browse/SPARK-46442) DS V2 
supports push down `PERCENTILE_CONT` and `PERCENTILE_DISC`
+- [[SPARK-49078]](https://issues.apache.org/jira/browse/SPARK-49078) Support 
show columns syntax in v2 table
+
+#### Hive Catalog
+
+- [[SPARK-45328]](https://issues.apache.org/jira/browse/SPARK-45328) Remove 
Hive support prior to 2.0.0
+- [[SPARK-47101]](https://issues.apache.org/jira/browse/SPARK-47101) Allow 
comma in top-level column names and relax HiveExternalCatalog schema check
+- [[SPARK-45265]](https://issues.apache.org/jira/browse/SPARK-45265) Support 
Hive 4.0 metastore
+
+#### XML
+
+- [[SPARK-44265]](https://issues.apache.org/jira/browse/SPARK-44265) Built-in 
XML data source support
+
+#### CSV
+
+- [[SPARK-46862]](https://issues.apache.org/jira/browse/SPARK-46862) Disable 
CSV column pruning in multi-line mode
+- [[SPARK-46890]](https://issues.apache.org/jira/browse/SPARK-46890) Fix CSV 
parsing bug with default values and column pruning
+- [[SPARK-50616]](https://issues.apache.org/jira/browse/SPARK-50616) Add File 
Extension Option to CSV DataSource Writer
+- [[SPARK-49125]](https://issues.apache.org/jira/browse/SPARK-49125) Allow 
duplicated column names in CSV writing
+- [[SPARK-49016]](https://issues.apache.org/jira/browse/SPARK-49016) Restore 
behavior for queries from raw CSV files
+- [[SPARK-48807]](https://issues.apache.org/jira/browse/SPARK-48807) Binary 
support for CSV datasource
+- [[SPARK-48602]](https://issues.apache.org/jira/browse/SPARK-48602) Make csv 
generator support different output style via spark.sql.binaryOutputStyle
+
+#### ORC
+
+- [[SPARK-46648]](https://issues.apache.org/jira/browse/SPARK-46648) Use zstd 
as the default ORC compression
+- [[SPARK-47456]](https://issues.apache.org/jira/browse/SPARK-47456) Support 
ORC Brotli codec
+- [[SPARK-41858]](https://issues.apache.org/jira/browse/SPARK-41858) Fix ORC 
reader perf regression due to DEFAULT value feature
+
+#### Avro
+
+- [[SPARK-47739]](https://issues.apache.org/jira/browse/SPARK-47739) Register 
logical Avro type
+- [[SPARK-49082]](https://issues.apache.org/jira/browse/SPARK-49082) Widening 
type promotions in `AvroDeserializer`
+- [[SPARK-46633]](https://issues.apache.org/jira/browse/SPARK-46633) Fix Avro 
reader to handle zero-length blocks
+- [[SPARK-50350]](https://issues.apache.org/jira/browse/SPARK-50350) Avro: add 
new function `schema_of_avro` (Scala side)
+- [[SPARK-46930]](https://issues.apache.org/jira/browse/SPARK-46930) Add 
support for custom prefix for Union type fields in Avro
+- [[SPARK-46746]](https://issues.apache.org/jira/browse/SPARK-46746) Attach 
codec extension to Avro datasource files
+- [[SPARK-46759]](https://issues.apache.org/jira/browse/SPARK-46759) Support 
compression level for xz and zstandard in Avro
+- [[SPARK-46766]](https://issues.apache.org/jira/browse/SPARK-46766) Add ZSTD 
Buffer Pool support for Avro datasource
+- [[SPARK-43380]](https://issues.apache.org/jira/browse/SPARK-43380) Fix Avro 
data type conversion issues without causing performance regression
+- [[SPARK-48545]](https://issues.apache.org/jira/browse/SPARK-48545) Create 
`to_avro` and `from_avro` SQL functions
+- [[SPARK-46990]](https://issues.apache.org/jira/browse/SPARK-46990) Fix 
loading empty Avro files (infinite loop)
+
+#### JDBC
+
+- [[SPARK-47361]](https://issues.apache.org/jira/browse/SPARK-47361) Improve 
JDBC data sources
+- [[SPARK-44977]](https://issues.apache.org/jira/browse/SPARK-44977) Upgrade 
Derby to 10.16.1.1
+- [[SPARK-47044]](https://issues.apache.org/jira/browse/SPARK-47044) Add 
executed query for JDBC external datasources to explain output
+- [[SPARK-45139]](https://issues.apache.org/jira/browse/SPARK-45139) Add 
`DatabricksDialect` to handle SQL type conversion
+
+#### Other notable Spark Connectors changes
+
+- [[SPARK-45905]](https://issues.apache.org/jira/browse/SPARK-45905) Least 
common type between decimal types should retain integral digits first
+- [[SPARK-45786]](https://issues.apache.org/jira/browse/SPARK-45786) Fix 
inaccurate Decimal multiplication and division results
+- [[SPARK-50705]](https://issues.apache.org/jira/browse/SPARK-50705) Make 
`QueryPlan` lock‑free
+- [[SPARK-46743]](https://issues.apache.org/jira/browse/SPARK-46743) Fix 
corner-case with `COUNT` + constant folding subquery
+- [[SPARK-47509]](https://issues.apache.org/jira/browse/SPARK-47509) Block 
subquery expressions in lambda/higher-order functions for correctness
+- [[SPARK-48498]](https://issues.apache.org/jira/browse/SPARK-48498) Always do 
char padding in predicates
+- [[SPARK-45915]](https://issues.apache.org/jira/browse/SPARK-45915) Treat 
decimal(x, 0) the same as IntegralType in PromoteStrings
+- [[SPARK-46220]](https://issues.apache.org/jira/browse/SPARK-46220) Restrict 
charsets in decode()
+- [[SPARK-45816]](https://issues.apache.org/jira/browse/SPARK-45816) Return 
`NULL` when overflowing during casting from timestamp to integers
+- [[SPARK-45586]](https://issues.apache.org/jira/browse/SPARK-45586) Reduce 
compiler latency for plans with large expression trees
+- [[SPARK-45507]](https://issues.apache.org/jira/browse/SPARK-45507) 
Correctness fix for nested correlated scalar subqueries with `COUNT` aggregates
+- [[SPARK-44550]](https://issues.apache.org/jira/browse/SPARK-44550) Enable 
correctness fixes for null `IN` (empty list) under ANSI
+- [[SPARK-47911]](https://issues.apache.org/jira/browse/SPARK-47911) 
Introduces a universal `BinaryFormatter` to make binary output consistent
+
+
+### PySpark Highlights
+
+- [[SPARK-49530]](https://issues.apache.org/jira/browse/SPARK-49530) 
Introducing PySpark Plotting API
+- [[SPARK-47540]](https://issues.apache.org/jira/browse/SPARK-47540) SPIP: 
Pure Python Package (Spark Connect)
+- [[SPARK-50132]](https://issues.apache.org/jira/browse/SPARK-50132) Add 
DataFrame API for Lateral Joins
+- [[SPARK-45981]](https://issues.apache.org/jira/browse/SPARK-45981) Improve 
Python language test coverage
+- [[SPARK-46858]](https://issues.apache.org/jira/browse/SPARK-46858) Upgrade 
Pandas to 2
+- [[SPARK-46910]](https://issues.apache.org/jira/browse/SPARK-46910) Eliminate 
JDK Requirement in PySpark Installation
+- [[SPARK-47274]](https://issues.apache.org/jira/browse/SPARK-47274) Provide 
more useful context for DataFrame API errors
+- [[SPARK-44076]](https://issues.apache.org/jira/browse/SPARK-44076) SPIP: 
Python Data Source API
+- [[SPARK-43797]](https://issues.apache.org/jira/browse/SPARK-43797) Python 
User-defined Table Functions
+- [[SPARK-46685]](https://issues.apache.org/jira/browse/SPARK-46685) PySpark 
UDF Unified Profiling
+
+#### DataFrame APIs and Features
+
+- [[SPARK-51079]](https://issues.apache.org/jira/browse/SPARK-51079) Support 
large variable types in pandas UDF, `createDataFrame` and `toPandas` with Arrow
+- [[SPARK-50718]](https://issues.apache.org/jira/browse/SPARK-50718) Support 
`addArtifact(s)` for PySpark
+- [[SPARK-50778]](https://issues.apache.org/jira/browse/SPARK-50778) Add 
`metadataColumn` to PySpark DataFrame
+- [[SPARK-50719]](https://issues.apache.org/jira/browse/SPARK-50719) Support 
`interruptOperation` for PySpark
+- [[SPARK-50790]](https://issues.apache.org/jira/browse/SPARK-50790) Implement 
`parse_json` in PySpark
+- [[SPARK-49306]](https://issues.apache.org/jira/browse/SPARK-49306) Create 
SQL function aliases for `zeroifnull` and `nullifzero`
+- [[SPARK-50132]](https://issues.apache.org/jira/browse/SPARK-50132) Add 
DataFrame API for Lateral Joins
+- [[SPARK-43295]](https://issues.apache.org/jira/browse/SPARK-43295) Support 
string type columns for `DataFrameGroupBy.sum`
+- [[SPARK-45575]](https://issues.apache.org/jira/browse/SPARK-45575) Support 
time travel options for `df.read` API
+- [[SPARK-45755]](https://issues.apache.org/jira/browse/SPARK-45755) Improve 
`Dataset.isEmpty()` by applying global limit 1
+  - Improves performance of isEmpty() by pushing down a global limit of 1.

Review Comment:
   missing link



##########
releases/_posts/2025-05-23-spark-release-4-0-0.md:
##########
@@ -0,0 +1,694 @@
+---
+layout: post
+title: Spark Release 4.0.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+  _edit_last: '4'
+  _wpas_done_all: '1'
+---
+
+Apache Spark 4.0.0 marks a significant milestone as the inaugural release in 
the 4.x series, embodying the collective effort of the vibrant open-source 
community. This release is a testament to tremendous collaboration, resolving 
over 5100 tickets with contributions from more than 390 individuals.
+
+Spark Connect continues its rapid advancement, delivering substantial 
improvements:
+- A new lightweight Python client 
([pyspark-client](https://pypi.org/project/pyspark-client)) at just 1.5 MB.
+- Full API compatibility for the Java client.
+- Greatly expanded API coverage.
+- ML on Spark Connect.
+- A new client implementation for 
[Swift](https://github.com/apache/spark-connect-swift).
+
+Spark SQL is significantly enriched with powerful new features designed to 
boost expressiveness and versatility for SQL workloads, such as VARIANT data 
type support, SQL user-defined functions, session variables, pipe syntax, and 
string collation.
+
+PySpark sees continuous dedication to both its functional breadth and the 
overall developer experience, bringing a native plotting API, a new Python Data 
Source API, support for Python UDTFs, and unified profiling for PySpark UDFs, 
alongside numerous other enhancements.
+
+Structured Streaming evolves with key additions that provide greater control 
and ease of debugging, notably the introduction of the Arbitrary State API v2 
for more flexible state management and the State Data Source for easier 
debugging.
+
+To download Apache Spark 4.0.0, please visit the 
[downloads](https://spark.apache.org/downloads.html) page. For [detailed 
changes](https://issues.apache.org/jira/projects/SPARK/versions/12353359), you 
can consult JIRA. We have also curated a list of high-level changes here, 
grouped by major modules.
+
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+
+### Core and Spark SQL Highlights
+
+- [[SPARK-45314]](https://issues.apache.org/jira/browse/SPARK-45314) Drop 
Scala 2.12 and make Scala 2.13 the default
+- [[SPARK-45315]](https://issues.apache.org/jira/browse/SPARK-45315) Drop JDK 
8/11 and make JDK 17 the default
+- [[SPARK-45923]](https://issues.apache.org/jira/browse/SPARK-45923) Spark 
Kubernetes Operator
+- [[SPARK-45869]](https://issues.apache.org/jira/browse/SPARK-45869) Revisit 
and improve Spark Standalone Cluster
+- [[SPARK-42849]](https://issues.apache.org/jira/browse/SPARK-42849) Session 
Variables
+- [[SPARK-44444]](https://issues.apache.org/jira/browse/SPARK-44444) Use ANSI 
SQL mode by default
+- [[SPARK-46057]](https://issues.apache.org/jira/browse/SPARK-46057) Support 
SQL user-defined functions
+- [[SPARK-45827]](https://issues.apache.org/jira/browse/SPARK-45827) Add 
VARIANT data type
+- [[SPARK-49555]](https://issues.apache.org/jira/browse/SPARK-49555) SQL Pipe 
syntax
+- [[SPARK-46830]](https://issues.apache.org/jira/browse/SPARK-46830) String 
Collation support
+- [[SPARK-44265]](https://issues.apache.org/jira/browse/SPARK-44265) Built-in 
XML data source support
+
+
+### Spark Core
+
+- [[SPARK-49524]](https://issues.apache.org/jira/browse/SPARK-49524) Improve 
K8s support
+- [[SPARK-47240]](https://issues.apache.org/jira/browse/SPARK-47240) SPIP: 
Structured Logging Framework for Apache Spark
+- [[SPARK-44893]](https://issues.apache.org/jira/browse/SPARK-44893) 
`ThreadInfo` improvements for monitoring APIs
+- [[SPARK-46861]](https://issues.apache.org/jira/browse/SPARK-46861) Avoid 
Deadlock in DAGScheduler
+- [[SPARK-47764]](https://issues.apache.org/jira/browse/SPARK-47764) Cleanup 
shuffle dependencies based on `ShuffleCleanupMode`
+- [[SPARK-49459]](https://issues.apache.org/jira/browse/SPARK-49459) Support 
CRC32C for Shuffle Checksum
+- [[SPARK-46383]](https://issues.apache.org/jira/browse/SPARK-46383) Reduce 
Driver Heap Usage by shortening `TaskInfo.accumulables()` lifespan
+- [[SPARK-45527]](https://issues.apache.org/jira/browse/SPARK-45527) Use 
fraction-based resource calculation
+- [[SPARK-47172]](https://issues.apache.org/jira/browse/SPARK-47172) Add 
AES-GCM as an optional AES cipher mode for RPC encryption
+- [[SPARK-47448]](https://issues.apache.org/jira/browse/SPARK-47448) Enable 
`spark.shuffle.service.removeShuffle` by default
+- [[SPARK-47674]](https://issues.apache.org/jira/browse/SPARK-47674) Enable 
`spark.metrics.appStatusSource.enabled` by default
+- [[SPARK-48063]](https://issues.apache.org/jira/browse/SPARK-48063) Enable 
`spark.stage.ignoreDecommissionFetchFailure` by default
+- [[SPARK-48268]](https://issues.apache.org/jira/browse/SPARK-48268) Add 
`spark.checkpoint.dir` config
+- [[SPARK-48292]](https://issues.apache.org/jira/browse/SPARK-48292) Revert 
SPARK-39195 (OutputCommitCoordinator) to fix duplication issues
+- [[SPARK-48518]](https://issues.apache.org/jira/browse/SPARK-48518) Make LZF 
compression run in parallel
+- [[SPARK-46132]](https://issues.apache.org/jira/browse/SPARK-46132) Support 
key password for JKS keys for RPC SSL
+- [[SPARK-46456]](https://issues.apache.org/jira/browse/SPARK-46456) Add 
`spark.ui.jettyStopTimeout` to set Jetty server stop timeout
+- [[SPARK-46256]](https://issues.apache.org/jira/browse/SPARK-46256) Parallel 
Compression Support for ZSTD
+- [[SPARK-45544]](https://issues.apache.org/jira/browse/SPARK-45544) Integrate 
SSL support into `TransportContext`
+- [[SPARK-45351]](https://issues.apache.org/jira/browse/SPARK-45351) Change 
`spark.shuffle.service.db.backend` default value to `ROCKSDB`
+- [[SPARK-44741]](https://issues.apache.org/jira/browse/SPARK-44741) Support 
regex-based `MetricFilter` in `StatsdSink`
+- [[SPARK-43987]](https://issues.apache.org/jira/browse/SPARK-43987) Separate 
`finalizeShuffleMerge` Processing to Dedicated Thread Pools
+- [[SPARK-45439]](https://issues.apache.org/jira/browse/SPARK-45439) Reduce 
memory usage of `LiveStageMetrics.accumIdsToMetricType`
+
+
+### Spark SQL
+
+#### Features
+
+- [[SPARK-50541]](https://issues.apache.org/jira/browse/SPARK-50541) Describe 
Table As JSON
+- [[SPARK-48031]](https://issues.apache.org/jira/browse/SPARK-48031) Support 
view schema evolution
+- [[SPARK-50883]](https://issues.apache.org/jira/browse/SPARK-50883) Support 
altering multiple columns in the same command
+- [[SPARK-47627]](https://issues.apache.org/jira/browse/SPARK-47627) Add `SQL 
MERGE` syntax to enable schema evolution
+- [[SPARK-47430]](https://issues.apache.org/jira/browse/SPARK-47430) Support 
`GROUP BY` for `MapType`
+- [[SPARK-49093]](https://issues.apache.org/jira/browse/SPARK-49093) `GROUP 
BY` with MapType nested inside complex type
+- [[SPARK-49098]](https://issues.apache.org/jira/browse/SPARK-49098) Add write 
options for `INSERT`
+- [[SPARK-49451]](https://issues.apache.org/jira/browse/SPARK-49451) Allow 
duplicate keys in `parse_json`
+- [[SPARK-46536]](https://issues.apache.org/jira/browse/SPARK-46536) Support 
`GROUP BY calendar_interval_type`
+- [[SPARK-46908]](https://issues.apache.org/jira/browse/SPARK-46908) Support 
star clause in `WHERE` clause
+- [[SPARK-36680]](https://issues.apache.org/jira/browse/SPARK-36680) Support 
dynamic table options via `WITH OPTIONS` syntax
+- [[SPARK-35553]](https://issues.apache.org/jira/browse/SPARK-35553) Improve 
correlated subqueries
+- [[SPARK-47492]](https://issues.apache.org/jira/browse/SPARK-47492) Widen 
whitespace rules in lexer to allow Unicode
+- [[SPARK-46246]](https://issues.apache.org/jira/browse/SPARK-46246) `EXECUTE 
IMMEDIATE`SQL support
+- [[SPARK-46207]](https://issues.apache.org/jira/browse/SPARK-46207) Support 
`MergeInto` in DataFrameWriterV2
+- [[SPARK-50129]](https://issues.apache.org/jira/browse/SPARK-50129) Add 
DataFrame APIs for subqueries
+- [[SPARK-50075]](https://issues.apache.org/jira/browse/SPARK-50075) DataFrame 
APIs for table-valued functions
+
+
+#### Functions
+
+- [[SPARK-52016]](https://issues.apache.org/jira/browse/SPARK-52016) New 
built-in functions in Spark 4.0
+- [[SPARK-44001]](https://issues.apache.org/jira/browse/SPARK-44001) Add 
option to allow unwrapping protobuf well-known wrapper types
+- [[SPARK-43427]](https://issues.apache.org/jira/browse/SPARK-43427) spark 
protobuf: allow upcasting unsigned integer types
+- [[SPARK-44983]](https://issues.apache.org/jira/browse/SPARK-44983) Convert 
`binary` to `string` by `to_char` for the formats: hex, base64, utf-8
+- [[SPARK-44868]](https://issues.apache.org/jira/browse/SPARK-44868) Convert 
`datetime` to `string` by `to_char`/`to_varchar`
+- [[SPARK-45796]](https://issues.apache.org/jira/browse/SPARK-45796) Support 
`MODE() WITHIN GROUP (ORDER BY col)`
+- [[SPARK-48658]](https://issues.apache.org/jira/browse/SPARK-48658) 
Encode/Decode functions report coding errors instead of mojibake
+- [[SPARK-45034]](https://issues.apache.org/jira/browse/SPARK-45034) Support 
deterministic mode function
+- [[SPARK-44778]](https://issues.apache.org/jira/browse/SPARK-44778) Add the 
alias `TIMEDIFF` for `TIMESTAMPDIFF`
+- [[SPARK-47497]](https://issues.apache.org/jira/browse/SPARK-47497) Make 
`to_csv` support arrays/maps/binary as pretty strings
+- [[SPARK-44840]](https://issues.apache.org/jira/browse/SPARK-44840) Make 
`array_insert()` 1-based for negative indexes
+
+#### Query optimization
+
+- [[SPARK-46946]](https://issues.apache.org/jira/browse/SPARK-46946) 
Supporting broadcast of multiple filtering keys in `DynamicPruning`
+- [[SPARK-48445]](https://issues.apache.org/jira/browse/SPARK-48445) Don’t 
inline UDFs with expansive children
+- [[SPARK-41413]](https://issues.apache.org/jira/browse/SPARK-41413) Avoid 
shuffle in Storage-Partitioned Join when partition keys mismatch, but 
expressions are compatible
+- [[SPARK-46941]](https://issues.apache.org/jira/browse/SPARK-46941) Prevent 
insertion of window group limit node with `SizeBasedWindowFunction`
+- [[SPARK-46707]](https://issues.apache.org/jira/browse/SPARK-46707) Add 
throwable field to expressions to improve predicate pushdown
+- [[SPARK-47511]](https://issues.apache.org/jira/browse/SPARK-47511) 
Canonicalize `WITH` expressions by reassigning IDs
+- [[SPARK-46502]](https://issues.apache.org/jira/browse/SPARK-46502) Support 
timestamp types in `UnwrapCastInBinaryComparison`
+- [[SPARK-46069]](https://issues.apache.org/jira/browse/SPARK-46069) Support 
unwrap timestamp type to date type
+- [[SPARK-46219]](https://issues.apache.org/jira/browse/SPARK-46219) Unwrap 
cast in join predicates
+- [[SPARK-45606]](https://issues.apache.org/jira/browse/SPARK-45606) Release 
restrictions on multi-layer runtime filter
+- [[SPARK-45909]](https://issues.apache.org/jira/browse/SPARK-45909) Remove 
`NumericType` cast if it can safely up-cast in `IsNotNull`
+
+#### Query execution
+
+- [[SPARK-45592]](https://issues.apache.org/jira/browse/SPARK-45592) 
Correctness issue in AQE with `InMemoryTableScanExec`
+- [[SPARK-50258]](https://issues.apache.org/jira/browse/SPARK-50258) Fix 
output column order changed issue after AQE
+- [[SPARK-46693]](https://issues.apache.org/jira/browse/SPARK-46693) Inject 
`LocalLimitExec` when matching `OffsetAndLimit` or `LimitAndOffset`
+- [[SPARK-48873]](https://issues.apache.org/jira/browse/SPARK-48873) Use 
`UnsafeRow` in JSON parser
+- [[SPARK-41471]](https://issues.apache.org/jira/browse/SPARK-41471) Reduce 
Spark shuffle when only one side of a join is `KeyGroupedPartitioning`
+- [[SPARK-45452]](https://issues.apache.org/jira/browse/SPARK-45452) Improve 
`InMemoryFileIndex` to use` FileSystem.listFiles` API
+- [[SPARK-48649]](https://issues.apache.org/jira/browse/SPARK-48649) Add 
`ignoreInvalidPartitionPaths` configs for skipping invalid partition paths
+- [[SPARK-45882]](https://issues.apache.org/jira/browse/SPARK-45882) 
`BroadcastHashJoinExec` propagate partitioning should respect 
CoalescedHashPartitioning
+
+
+### Spark Connectors
+
+#### Data Source V2 framework
+
+- [[SPARK-45784]](https://issues.apache.org/jira/browse/SPARK-45784) Introduce 
clustering mechanism to Spark
+- [[SPARK-50820]](https://issues.apache.org/jira/browse/SPARK-50820) DSv2: 
Conditional nullification of metadata columns in DML
+- [[SPARK-51938]](https://issues.apache.org/jira/browse/SPARK-51938) Improve 
Storage Partition Join
+- [[SPARK-50700]](https://issues.apache.org/jira/browse/SPARK-50700) 
`spark.sql.catalog.spark_catalog` supports builtin magic value
+- [[SPARK-48781]](https://issues.apache.org/jira/browse/SPARK-48781) Add 
Catalog APIs for loading stored procedures
+- [[SPARK-49246]](https://issues.apache.org/jira/browse/SPARK-49246) 
`TableCatalog#loadTable` should indicate if it's for writing
+- [[SPARK-45965]](https://issues.apache.org/jira/browse/SPARK-45965) Move DSv2 
partitioning expressions into functions.partitioning
+- [[SPARK-46272]](https://issues.apache.org/jira/browse/SPARK-46272) Support 
CTAS using DSv2 sources
+- [[SPARK-46043]](https://issues.apache.org/jira/browse/SPARK-46043) Support 
create table using DSv2 sources
+- [[SPARK-48668]](https://issues.apache.org/jira/browse/SPARK-48668) Support 
`ALTER NAMESPACE ... UNSET PROPERTIES` in v2
+- [[SPARK-46442]](https://issues.apache.org/jira/browse/SPARK-46442) DS V2 
supports push down `PERCENTILE_CONT` and `PERCENTILE_DISC`
+- [[SPARK-49078]](https://issues.apache.org/jira/browse/SPARK-49078) Support 
show columns syntax in v2 table
+
+#### Hive Catalog
+
+- [[SPARK-45328]](https://issues.apache.org/jira/browse/SPARK-45328) Remove 
Hive support prior to 2.0.0
+- [[SPARK-47101]](https://issues.apache.org/jira/browse/SPARK-47101) Allow 
comma in top-level column names and relax HiveExternalCatalog schema check
+- [[SPARK-45265]](https://issues.apache.org/jira/browse/SPARK-45265) Support 
Hive 4.0 metastore
+
+#### XML
+
+- [[SPARK-44265]](https://issues.apache.org/jira/browse/SPARK-44265) Built-in 
XML data source support
+
+#### CSV
+
+- [[SPARK-46862]](https://issues.apache.org/jira/browse/SPARK-46862) Disable 
CSV column pruning in multi-line mode
+- [[SPARK-46890]](https://issues.apache.org/jira/browse/SPARK-46890) Fix CSV 
parsing bug with default values and column pruning
+- [[SPARK-50616]](https://issues.apache.org/jira/browse/SPARK-50616) Add File 
Extension Option to CSV DataSource Writer
+- [[SPARK-49125]](https://issues.apache.org/jira/browse/SPARK-49125) Allow 
duplicated column names in CSV writing
+- [[SPARK-49016]](https://issues.apache.org/jira/browse/SPARK-49016) Restore 
behavior for queries from raw CSV files
+- [[SPARK-48807]](https://issues.apache.org/jira/browse/SPARK-48807) Binary 
support for CSV datasource
+- [[SPARK-48602]](https://issues.apache.org/jira/browse/SPARK-48602) Make csv 
generator support different output style via spark.sql.binaryOutputStyle
+
+#### ORC
+
+- [[SPARK-46648]](https://issues.apache.org/jira/browse/SPARK-46648) Use zstd 
as the default ORC compression
+- [[SPARK-47456]](https://issues.apache.org/jira/browse/SPARK-47456) Support 
ORC Brotli codec
+- [[SPARK-41858]](https://issues.apache.org/jira/browse/SPARK-41858) Fix ORC 
reader perf regression due to DEFAULT value feature
+
+#### Avro
+
+- [[SPARK-47739]](https://issues.apache.org/jira/browse/SPARK-47739) Register 
logical Avro type
+- [[SPARK-49082]](https://issues.apache.org/jira/browse/SPARK-49082) Widening 
type promotions in `AvroDeserializer`
+- [[SPARK-46633]](https://issues.apache.org/jira/browse/SPARK-46633) Fix Avro 
reader to handle zero-length blocks
+- [[SPARK-50350]](https://issues.apache.org/jira/browse/SPARK-50350) Avro: add 
new function `schema_of_avro` (Scala side)
+- [[SPARK-46930]](https://issues.apache.org/jira/browse/SPARK-46930) Add 
support for custom prefix for Union type fields in Avro
+- [[SPARK-46746]](https://issues.apache.org/jira/browse/SPARK-46746) Attach 
codec extension to Avro datasource files
+- [[SPARK-46759]](https://issues.apache.org/jira/browse/SPARK-46759) Support 
compression level for xz and zstandard in Avro
+- [[SPARK-46766]](https://issues.apache.org/jira/browse/SPARK-46766) Add ZSTD 
Buffer Pool support for Avro datasource
+- [[SPARK-43380]](https://issues.apache.org/jira/browse/SPARK-43380) Fix Avro 
data type conversion issues without causing performance regression
+- [[SPARK-48545]](https://issues.apache.org/jira/browse/SPARK-48545) Create 
`to_avro` and `from_avro` SQL functions
+- [[SPARK-46990]](https://issues.apache.org/jira/browse/SPARK-46990) Fix 
loading empty Avro files (infinite loop)
+
+#### JDBC
+
+- [[SPARK-47361]](https://issues.apache.org/jira/browse/SPARK-47361) Improve 
JDBC data sources
+- [[SPARK-44977]](https://issues.apache.org/jira/browse/SPARK-44977) Upgrade 
Derby to 10.16.1.1
+- [[SPARK-47044]](https://issues.apache.org/jira/browse/SPARK-47044) Add 
executed query for JDBC external datasources to explain output
+- [[SPARK-45139]](https://issues.apache.org/jira/browse/SPARK-45139) Add 
`DatabricksDialect` to handle SQL type conversion
+
+#### Other notable Spark Connectors changes
+
+- [[SPARK-45905]](https://issues.apache.org/jira/browse/SPARK-45905) Least 
common type between decimal types should retain integral digits first
+- [[SPARK-45786]](https://issues.apache.org/jira/browse/SPARK-45786) Fix 
inaccurate Decimal multiplication and division results
+- [[SPARK-50705]](https://issues.apache.org/jira/browse/SPARK-50705) Make 
`QueryPlan` lock‑free
+- [[SPARK-46743]](https://issues.apache.org/jira/browse/SPARK-46743) Fix 
corner-case with `COUNT` + constant folding subquery
+- [[SPARK-47509]](https://issues.apache.org/jira/browse/SPARK-47509) Block 
subquery expressions in lambda/higher-order functions for correctness
+- [[SPARK-48498]](https://issues.apache.org/jira/browse/SPARK-48498) Always do 
char padding in predicates
+- [[SPARK-45915]](https://issues.apache.org/jira/browse/SPARK-45915) Treat 
decimal(x, 0) the same as IntegralType in PromoteStrings
+- [[SPARK-46220]](https://issues.apache.org/jira/browse/SPARK-46220) Restrict 
charsets in decode()
+- [[SPARK-45816]](https://issues.apache.org/jira/browse/SPARK-45816) Return 
`NULL` when overflowing during casting from timestamp to integers
+- [[SPARK-45586]](https://issues.apache.org/jira/browse/SPARK-45586) Reduce 
compiler latency for plans with large expression trees
+- [[SPARK-45507]](https://issues.apache.org/jira/browse/SPARK-45507) 
Correctness fix for nested correlated scalar subqueries with `COUNT` aggregates
+- [[SPARK-44550]](https://issues.apache.org/jira/browse/SPARK-44550) Enable 
correctness fixes for null `IN` (empty list) under ANSI
+- [[SPARK-47911]](https://issues.apache.org/jira/browse/SPARK-47911) 
Introduces a universal `BinaryFormatter` to make binary output consistent
+
+
+### PySpark Highlights
+
+- [[SPARK-49530]](https://issues.apache.org/jira/browse/SPARK-49530) 
Introducing PySpark Plotting API
+- [[SPARK-47540]](https://issues.apache.org/jira/browse/SPARK-47540) SPIP: 
Pure Python Package (Spark Connect)
+- [[SPARK-50132]](https://issues.apache.org/jira/browse/SPARK-50132) Add 
DataFrame API for Lateral Joins
+- [[SPARK-45981]](https://issues.apache.org/jira/browse/SPARK-45981) Improve 
Python language test coverage
+- [[SPARK-46858]](https://issues.apache.org/jira/browse/SPARK-46858) Upgrade 
Pandas to 2
+- [[SPARK-46910]](https://issues.apache.org/jira/browse/SPARK-46910) Eliminate 
JDK Requirement in PySpark Installation
+- [[SPARK-47274]](https://issues.apache.org/jira/browse/SPARK-47274) Provide 
more useful context for DataFrame API errors
+- [[SPARK-44076]](https://issues.apache.org/jira/browse/SPARK-44076) SPIP: 
Python Data Source API
+- [[SPARK-43797]](https://issues.apache.org/jira/browse/SPARK-43797) Python 
User-defined Table Functions
+- [[SPARK-46685]](https://issues.apache.org/jira/browse/SPARK-46685) PySpark 
UDF Unified Profiling
+
+#### DataFrame APIs and Features
+
+- [[SPARK-51079]](https://issues.apache.org/jira/browse/SPARK-51079) Support 
large variable types in pandas UDF, `createDataFrame` and `toPandas` with Arrow
+- [[SPARK-50718]](https://issues.apache.org/jira/browse/SPARK-50718) Support 
`addArtifact(s)` for PySpark
+- [[SPARK-50778]](https://issues.apache.org/jira/browse/SPARK-50778) Add 
`metadataColumn` to PySpark DataFrame
+- [[SPARK-50719]](https://issues.apache.org/jira/browse/SPARK-50719) Support 
`interruptOperation` for PySpark
+- [[SPARK-50790]](https://issues.apache.org/jira/browse/SPARK-50790) Implement 
`parse_json` in PySpark
+- [[SPARK-49306]](https://issues.apache.org/jira/browse/SPARK-49306) Create 
SQL function aliases for `zeroifnull` and `nullifzero`
+- [[SPARK-50132]](https://issues.apache.org/jira/browse/SPARK-50132) Add 
DataFrame API for Lateral Joins
+- [[SPARK-43295]](https://issues.apache.org/jira/browse/SPARK-43295) Support 
string type columns for `DataFrameGroupBy.sum`
+- [[SPARK-45575]](https://issues.apache.org/jira/browse/SPARK-45575) Support 
time travel options for `df.read` API
+- [[SPARK-45755]](https://issues.apache.org/jira/browse/SPARK-45755) Improve 
`Dataset.isEmpty()` by applying global limit 1
+  - Improves performance of isEmpty() by pushing down a global limit of 1.
+- [[SPARK-48761]](https://issues.apache.org/jira/browse/SPARK-48761) Introduce 
`clusterBy` DataFrameWriter API for Scala
+- [[SPARK-45929]](https://issues.apache.org/jira/browse/SPARK-45929) Support 
`groupingSets` operation in DataFrame API
+  - Extends `groupingSets(...)` to DataFrame/DS-level APIs.

Review Comment:
   missing link



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Re: [PR] Add Spark 4.0.0 release notes [spark-website]

Reply via email to