I'd like to get SPARK-27296 onto 3.0: SPARK-27296 <https://issues.apache.org/jira/browse/SPARK-27296> Efficient User Defined Aggregators
On Mon, Oct 7, 2019 at 3:03 PM Xingbo Jiang <jiangxb1...@gmail.com> wrote: > Hi all, > > I went over all the finished JIRA tickets targeted to Spark 3.0.0, here > I'm listing all the notable features and major changes that are ready to > test/deliver, please don't hesitate to add more to the list: > > SPARK-11215 <https://issues.apache.org/jira/browse/SPARK-11215> Multiple > columns support added to various Transformers: StringIndexer > > SPARK-11150 <https://issues.apache.org/jira/browse/SPARK-11150> Implement > Dynamic Partition Pruning > > SPARK-13677 <https://issues.apache.org/jira/browse/SPARK-13677> Support > Tree-Based Feature Transformation > > SPARK-16692 <https://issues.apache.org/jira/browse/SPARK-16692> Add > MultilabelClassificationEvaluator > > SPARK-19591 <https://issues.apache.org/jira/browse/SPARK-19591> Add > sample weights to decision trees > > SPARK-19712 <https://issues.apache.org/jira/browse/SPARK-19712> Pushing > Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc. > > SPARK-19827 <https://issues.apache.org/jira/browse/SPARK-19827> R API for > Power Iteration Clustering > > SPARK-20286 <https://issues.apache.org/jira/browse/SPARK-20286> Improve > logic for timing out executors in dynamic allocation > > SPARK-20636 <https://issues.apache.org/jira/browse/SPARK-20636> Eliminate > unnecessary shuffle with adjacent Window expressions > > SPARK-22148 <https://issues.apache.org/jira/browse/SPARK-22148> Acquire > new executors to avoid hang because of blacklisting > > SPARK-22796 <https://issues.apache.org/jira/browse/SPARK-22796> Multiple > columns support added to various Transformers: PySpark QuantileDiscretizer > > SPARK-23128 <https://issues.apache.org/jira/browse/SPARK-23128> A new > approach to do adaptive execution in Spark SQL > > SPARK-23674 <https://issues.apache.org/jira/browse/SPARK-23674> Add Spark > ML Listener for Tracking ML Pipeline Status > > SPARK-23710 <https://issues.apache.org/jira/browse/SPARK-23710> Upgrade > the built-in Hive to 2.3.5 for hadoop-3.2 > > SPARK-24333 <https://issues.apache.org/jira/browse/SPARK-24333> Add fit > with validation set to Gradient Boosted Trees: Python API > > SPARK-24417 <https://issues.apache.org/jira/browse/SPARK-24417> Build and > Run Spark on JDK11 > > SPARK-24615 <https://issues.apache.org/jira/browse/SPARK-24615> > Accelerator-aware task scheduling for Spark > > SPARK-24920 <https://issues.apache.org/jira/browse/SPARK-24920> Allow > sharing Netty's memory pool allocators > > SPARK-25250 <https://issues.apache.org/jira/browse/SPARK-25250> Fix race > condition with tasks running when new attempt for same stage is created > leads to other task in the next attempt running on the same partition id > retry multiple times > > SPARK-25341 <https://issues.apache.org/jira/browse/SPARK-25341> Support > rolling back a shuffle map stage and re-generate the shuffle files > > SPARK-25348 <https://issues.apache.org/jira/browse/SPARK-25348> Data > source for binary files > > SPARK-25603 <https://issues.apache.org/jira/browse/SPARK-25603> > Generalize Nested Column Pruning > > SPARK-26132 <https://issues.apache.org/jira/browse/SPARK-26132> Remove > support for Scala 2.11 in Spark 3.0.0 > > SPARK-26215 <https://issues.apache.org/jira/browse/SPARK-26215> define > reserved keywords after SQL standard > > SPARK-26412 <https://issues.apache.org/jira/browse/SPARK-26412> Allow > Pandas UDF to take an iterator of pd.DataFrames > > SPARK-26785 <https://issues.apache.org/jira/browse/SPARK-26785> data > source v2 API refactor: streaming write > > SPARK-26956 <https://issues.apache.org/jira/browse/SPARK-26956> remove > streaming output mode from data source v2 APIs > > SPARK-27064 <https://issues.apache.org/jira/browse/SPARK-27064> create > StreamingWrite at the beginning of streaming execution > > SPARK-27119 <https://issues.apache.org/jira/browse/SPARK-27119> Do not > infer schema when reading Hive serde table with native data source > > SPARK-27225 <https://issues.apache.org/jira/browse/SPARK-27225> Implement > join strategy hints > > SPARK-27240 <https://issues.apache.org/jira/browse/SPARK-27240> Use > pandas DataFrame for struct type argument in Scalar Pandas UDF > > SPARK-27338 <https://issues.apache.org/jira/browse/SPARK-27338> Fix > deadlock between TaskMemoryManager and > UnsafeExternalSorter$SpillableIterator > > SPARK-27396 <https://issues.apache.org/jira/browse/SPARK-27396> Public > APIs for extended Columnar Processing Support > > SPARK-27589 <https://issues.apache.org/jira/browse/SPARK-27589> > Re-implement file sources with data source V2 API > > SPARK-27677 <https://issues.apache.org/jira/browse/SPARK-27677> > Disk-persisted RDD blocks served by shuffle service, and ignored for > Dynamic Allocation > > SPARK-27699 <https://issues.apache.org/jira/browse/SPARK-27699> Partially > push down disjunctive predicated in Parquet/ORC > > SPARK-27763 <https://issues.apache.org/jira/browse/SPARK-27763> Port test > cases from PostgreSQL to Spark SQL (ongoing) > > SPARK-27884 <https://issues.apache.org/jira/browse/SPARK-27884> Deprecate > Python 2 support > > SPARK-27921 <https://issues.apache.org/jira/browse/SPARK-27921> Convert > applicable *.sql tests into UDF integrated test base > > SPARK-27963 <https://issues.apache.org/jira/browse/SPARK-27963> Allow > dynamic allocation without an external shuffle service > > SPARK-28177 <https://issues.apache.org/jira/browse/SPARK-28177> Adjust > post shuffle partition number in adaptive execution > > SPARK-28372 <https://issues.apache.org/jira/browse/SPARK-28372> Document > Spark WEB UI > > SPARK-28399 <https://issues.apache.org/jira/browse/SPARK-28399> > RobustScaler feature transformer > > SPARK-28426 <https://issues.apache.org/jira/browse/SPARK-28426> Metadata > Handling in Thrift Server > > SPARK-28588 <https://issues.apache.org/jira/browse/SPARK-28588> Build a > SQL reference doc (ongoing) > > SPARK-28608 <https://issues.apache.org/jira/browse/SPARK-28608> Improve > test coverage of ThriftServer > > SPARK-28753 <https://issues.apache.org/jira/browse/SPARK-28753> > Dynamically reuse subqueries in AQE > > SPARK-28855 <https://issues.apache.org/jira/browse/SPARK-28855> Remove > outdated Experimental, Evolving annotations > SPARK-25908 <https://issues.apache.org/jira/browse/SPARK-25908> > SPARK-28980 <https://issues.apache.org/jira/browse/SPARK-28980> Remove > deprecated items since <= 2.2.0 > > Cheers, > > Xingbo >