I got many great feedbacks from the community about the recent 3.0
preview release. Since the last 3.0 preview release, we already have 353
commits [https://github.com/apache/spark/compare/v3.0.0-preview...master].
There are various important features and behavior changes we want the
community to try before entering the official release candidates of Spark
3.0.


Below is my selected items that are not part of the last 3.0 preview but
already available in the upstream master branch:


   - Support JDK 11 with Hadoop 2.7
   - Spark SQL will respect its own default format (i.e., parquet) when
   users do CREATE TABLE without USING or STORED AS clauses
   - Enable Parquet nested schema pruning and nested pruning on expressions
   by default
   - Add observable Metrics for Streaming queries
   - Column pruning through nondeterministic expressions
   - RecordBinaryComparator should check endianness when compared by long
   - Improve parallelism for local shuffle reader in adaptive query
   execution
   - Upgrade Apache Arrow to version 0.15.1
   - Various interval-related SQL support
   - Add a mode to pin Python thread into JVM's
   - Provide option to clean up completed files in streaming query

I am wondering if we can have another preview release for Spark 3.0? This
can help us find the design/API defects as early as possible and avoid the
significant delay of the upcoming Spark 3.0 release


Also, any committer is willing to volunteer as the release manager of the
next preview release of Spark 3.0, if we have such a release?


Cheers,


Xiao

Reply via email to