Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

Hyukjin Kwon Sat, 03 Oct 2020 17:41:30 -0700

Nice summary. Thanks Dongjoon. One minor correction -> I believe we dropped
R 3.5 and below at branch 2.4 as well.


On Sun, 4 Oct 2020, 09:17 Dongjoon Hyun, <dongjoon.h...@gmail.com> wrote:

> Hi, All.
>
> As of today, master branch (Apache Spark 3.1.0) resolved
> 852+ JIRA issues and 606+ issues are 3.1.0-only patches.
> According to the 3.1.0 release window, branch-3.1 will be
> created on November 1st and enters QA period.
>
> Here are some notable updates I've been monitoring.
>
> *Language*
> 01. SPARK-25075 Support Scala 2.13
>       - Since SPARK-32926, Scala 2.13 build test has
>         become a part of GitHub Action jobs.
>       - After SPARK-33044, Scala 2.13 test will be
>         a part of Jenkins jobs.
> 02. SPARK-29909 Drop Python 2 and Python 3.4 and 3.5
> 03. SPARK-32082 Project Zen: Improving Python usability
>       - 7 of 16 issues are resolved.
> 04. SPARK-32073 Drop R < 3.5 support
>       - This is done for Spark 3.0.1 and 3.1.0.
>
> *Dependency*
> 05. SPARK-32058 Use Apache Hadoop 3.2.0 dependency
>       - This changes the default dist. for better cloud support
> 06. SPARK-32981 Remove hive-1.2 distribution
> 07. SPARK-20202 Remove references to org.spark-project.hive
>       - This will remove Hive 1.2.1 from source code
> 08. SPARK-29250 Upgrade to Hadoop 3.2.1 (WIP)
>
> *Core*
> 09. SPARK-27495 Support Stage level resource conf and scheduling
>       - 11 of 15 issues are resolved
> 10. SPARK-25299 Use remote storage for persisting shuffle data
>       - 8 of 14 issues are resolved
>
> *Resource Manager*
> 11. SPARK-33005 Kubernetes GA preparation
>       - It is on the way and we are waiting for more feedback.
>
> *SQL*
> 12. SPARK-30648/SPARK-32346 Support filters pushdown
>       to JSON/Avro
> 13. SPARK-32948/SPARK-32958 Add Json expression optimizer
> 14. SPARK-12312 Support JDBC Kerberos w/ keytab
>       - 11 of 17 issues are resolved
> 15. SPARK-27589 DSv2 was mostly completed in 3.0
>       and added more features in 3.1 but still we missed
>       - All built-in DataSource v2 write paths are disabled
>         and v1 write is used instead.
>       - Support partition pruning with subqueries
>       - Support bucketing
>
> We still have one month before the feature freeze
> and starting QA. If you are working for 3.1,
> please consider the timeline and share your schedule
> with the Apache Spark community. For the other stuff,
> we can put it into 3.2 release scheduled in June 2021.
>
> Last not but least, I want to emphasize (7) once again.
> We need to remove the forked unofficial Hive eventually.
> Please let us know your reasons if you need to build
> from Apache Spark 3.1 source code for Hive 1.2.
>
> https://github.com/apache/spark/pull/29936
>
> As I wrote in the above PR description, for old releases,
> Apache Spark 2.4(LTS) and 3.0 (~2021.12) will provide
> Hive 1.2-based distribution.
>
> Bests,
> Dongjoon.
>

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

Reply via email to