Nice summary. Thanks Dongjoon. One minor correction -> I believe we dropped R 3.5 and below at branch 2.4 as well.
On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun, <dongjoon.h...@gmail.com> wrote: > Hi, All. > > As of today, master branch (Apache Spark 3.1.0) resolved > 852+ JIRA issues and 606+ issues are 3.1.0-only patches. > According to the 3.1.0 release window, branch-3.1 will be > created on November 1st and enters QA period. > > Here are some notable updates I've been monitoring. > > *Language* > 01. SPARK-25075 Support Scala 2.13 > - Since SPARK-32926, Scala 2.13 build test has > become a part of GitHub Action jobs. > - After SPARK-33044, Scala 2.13 test will be > a part of Jenkins jobs. > 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5 > 03. SPARK-32082 Project Zen: Improving Python usability > - 7 of 16 issues are resolved. > 04. SPARK-32073 Drop R < 3.5 support > - This is done for Spark 3.0.1 and 3.1.0. > > *Dependency* > 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency > - This changes the default dist. for better cloud support > 06. SPARK-32981 Remove hive-1.2 distribution > 07. SPARK-20202 Remove references to org.spark-project.hive > - This will remove Hive 1.2.1 from source code > 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP) > > *Core* > 09. SPARK-27495 Support Stage level resource conf and scheduling > - 11 of 15 issues are resolved > 10. SPARK-25299 Use remote storage for persisting shuffle data > - 8 of 14 issues are resolved > > *Resource Manager* > 11. SPARK-33005 Kubernetes GA preparation > - It is on the way and we are waiting for more feedback. > > *SQL* > 12. SPARK-30648/SPARK-32346 Support filters pushdown > to JSON/Avro > 13. SPARK-32948/SPARK-32958 Add Json expression optimizer > 14. SPARK-12312 Support JDBC Kerberos w/ keytab > - 11 of 17 issues are resolved > 15. SPARK-27589 DSv2 was mostly completed in 3.0 > and added more features in 3.1 but still we missed > - All built-in DataSource v2 write paths are disabled > and v1 write is used instead. > - Support partition pruning with subqueries > - Support bucketing > > We still have one month before the feature freeze > and starting QA. If you are working for 3.1, > please consider the timeline and share your schedule > with the Apache Spark community. For the other stuff, > we can put it into 3.2 release scheduled in June 2021. > > Last not but least, I want to emphasize (7) once again. > We need to remove the forked unofficial Hive eventually. > Please let us know your reasons if you need to build > from Apache Spark 3.1 source code for Hive 1.2. > > https://github.com/apache/spark/pull/29936 > > As I wrote in the above PR description, for old releases, > Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide > Hive 1.2-based distribution. > > Bests, > Dongjoon. >