Hi all, Let's start a new thread to discuss the on-going features for Spark 3.0 preview release.
Below is the feature list for the Spark 3.0 preview release. The list is collected from the previous discussions in the dev list. - Followup of the shuffle+repartition correctness issue: support roll back shuffle stages (https://issues.apache.org/jira/browse/SPARK-25341) - Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 ( https://issues.apache.org/jira/browse/SPARK-23710) - JDK 11 support (https://issues.apache.org/jira/browse/SPARK-28684) - Scala 2.13 support (https://issues.apache.org/jira/browse/SPARK-25075) - DataSourceV2 features - Enable file source v2 writers ( https://issues.apache.org/jira/browse/SPARK-27589) - CREATE TABLE USING with DataSourceV2 - New pushdown API for DataSourceV2 - Support DELETE/UPDATE/MERGE Operations in DataSourceV2 ( https://issues.apache.org/jira/browse/SPARK-28303) - Correctness issue: Stream-stream joins - left outer join gives inconsistent output (https://issues.apache.org/jira/browse/SPARK-26154) - Revisiting Python / pandas UDF ( https://issues.apache.org/jira/browse/SPARK-28264) - Spark Graph (https://issues.apache.org/jira/browse/SPARK-25994) Features that are nice to have: - Use remote storage for persisting shuffle data ( https://issues.apache.org/jira/browse/SPARK-25299) - Spark + Hadoop + Parquet + Avro compatibility problems ( https://issues.apache.org/jira/browse/SPARK-25588) - Introduce new option to Kafka source - specify timestamp to start and end offset (https://issues.apache.org/jira/browse/SPARK-26848) - Delete files after processing in structured streaming ( https://issues.apache.org/jira/browse/SPARK-20568) Here, I am proposing to cut the branch on October 15th. If the features are targeting to 3.0 preview release, please prioritize the work and finish it before the date. Note, Oct. 15th is not the code freeze of Spark 3.0. That means, the community will still work on the features for the upcoming Spark 3.0 release, even if they are not included in the preview release. The goal of preview release is to collect more feedback from the community regarding the new 3.0 features/behavior changes. Thanks!