> New pushdown API for DataSourceV2 One correction: I want to revisit the pushdown API to make sure it works for dynamic partition pruning and can be extended to support limit/aggregate/... pushdown in the future. It should be a small API update instead of a new API.
On Fri, Sep 20, 2019 at 3:46 PM Xingbo Jiang <jiangxb1...@gmail.com> wrote: > Hi all, > > Let's start a new thread to discuss the on-going features for Spark 3.0 > preview release. > > Below is the feature list for the Spark 3.0 preview release. The list is > collected from the previous discussions in the dev list. > > - Followup of the shuffle+repartition correctness issue: support roll > back shuffle stages (https://issues.apache.org/jira/browse/SPARK-25341) > - Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 ( > https://issues.apache.org/jira/browse/SPARK-23710) > - JDK 11 support (https://issues.apache.org/jira/browse/SPARK-28684) > - Scala 2.13 support (https://issues.apache.org/jira/browse/SPARK-25075 > ) > - DataSourceV2 features > - Enable file source v2 writers ( > https://issues.apache.org/jira/browse/SPARK-27589) > - CREATE TABLE USING with DataSourceV2 > - New pushdown API for DataSourceV2 > - Support DELETE/UPDATE/MERGE Operations in DataSourceV2 ( > https://issues.apache.org/jira/browse/SPARK-28303) > - Correctness issue: Stream-stream joins - left outer join gives > inconsistent output (https://issues.apache.org/jira/browse/SPARK-26154) > - Revisiting Python / pandas UDF ( > https://issues.apache.org/jira/browse/SPARK-28264) > - Spark Graph (https://issues.apache.org/jira/browse/SPARK-25994) > > Features that are nice to have: > > - Use remote storage for persisting shuffle data ( > https://issues.apache.org/jira/browse/SPARK-25299) > - Spark + Hadoop + Parquet + Avro compatibility problems ( > https://issues.apache.org/jira/browse/SPARK-25588) > - Introduce new option to Kafka source - specify timestamp to start > and end offset (https://issues.apache.org/jira/browse/SPARK-26848) > - Delete files after processing in structured streaming ( > https://issues.apache.org/jira/browse/SPARK-20568) > > Here, I am proposing to cut the branch on October 15th. If the features > are targeting to 3.0 preview release, please prioritize the work and finish > it before the date. Note, Oct. 15th is not the code freeze of Spark 3.0. > That means, the community will still work on the features for the upcoming > Spark 3.0 release, even if they are not included in the preview release. > The goal of preview release is to collect more feedback from the community > regarding the new 3.0 features/behavior changes. > > Thanks! >