Hi, All.

As announced on November 3, 2025, we have been reviewing and organizing the
tasks related to Apache Spark 4.1.0 in preparation for its delivery:

https://lists.apache.org/thread/kyogy86v07d5lsrkdgz4916j2x2pl1kk
([FYI] Spark 4.1 & 4.2 Branch Updates and Tracking)

- SPARK-51166: Prepare Apache Spark 4.1.0
- SPARK-54137: Prepare Apache Spark 4.2.0

Thank you all for your valuable feedback and contributions during the
preparation process. Below is the second progress update for Apache Spark
4.1.0.

As the release manager, I’ve roughly categorized the tasks into two groups:
- The first group is ready for QA, and it’s highly recommended that
everyone start testing these in your environments.
- The second group includes items at risk, which contributors and their
shepherds are encouraged to complete by mid-November.

1. Ready for QA
- SPARK-48094 Reduce GitHub Action usage according to ASF project allowance
- SPARK-50856 Spark Connect Support for TransformWithStateInPandas In Python
- SPARK-51982 Prepare and Configure Pandas API on Spark for ANSI Mode
- SPARK-52012 Restore IDE Index with type annotations
- SPARK-52176 Release Apache Spark via GitHub Actions
- SPARK-52214 Python Arrow UDF
- SPARK-52625 Monthly preview release
- SPARK-52650 User Defined Type Improvements
- SPARK-53736 Real-time Mode in Structured Streaming
- SPARK-52857 Improve `Variant` data type support
- SPARK-52984 Pandas on Spark ANSI Improvement
- SPARK-53005 Add ANSI Compliance to Pandas API on Spark
- SPARK-53047 Modernize Spark to leverage the latest Java features
- SPARK-53608 Improve Python Aggregation UDFs
- SPARK-53672 Unified interface for UDF
- SPARK-53754 Python worker logging infrastructure
- SPARK-53885 Frequency estimation functions
- SPARK-54012 Improve Netty usage patterns
- SPARK-54016 Improve K8s support in Spark 4.1.0
- SPARK-54017 Audit test dependencies in Spark 4.1.0
- SPARK-54248 Changes of the existing configurations in Spark 4.1.0
- SPARK-54249 Improve Spark Event Log, History Server, and Web UI

2. Tasks at Risk
- SPARK-48338 Sql Scripting support for Spark SQL
- SPARK-48515 Enable Arrow optimization for Python UDFs
- SPARK-51162 Add the TIME data type
- SPARK-51207 Constraints in DSv2
- SPARK-51658 Add geospatial types in Spark
- SPARK-51727 Declarative Pipelines
- SPARK-52011 Reduce HDFS NameNode RPC on vectorized Parquet reader
- SPARK-52282 Improve SQL User-defined Functions
- SPARK-52979 Python Arrow UDTF
- SPARK-53484 JDBC Driver for Spark Connect

Thank you, as always, for your continued collaboration and contributions.

Best regards,
Dongjoon Hyun

Reply via email to