Some updates for the JIRA tickets that we want to resolve before Spark 2.4.
green: merged orange: in progress red: likely to miss SPARK-24374 <https://issues.apache.org/jira/browse/SPARK-24374>: Support Barrier Execution Mode in Apache Spark The core functionality is finished, but we still need to add Python API. Tracked by SPARK-24822 <https://issues.apache.org/jira/browse/SPARK-24822> SPARK-23899 <https://issues.apache.org/jira/browse/SPARK-23899>: Built-in SQL Function Improvement I think it's ready to go. Although there are still some functions working in progress, the common ones are all merged. SPARK-14220 <https://issues.apache.org/jira/browse/SPARK-14220>: Build and test Spark against Scala 2.12 It's close, just one last piece. Tracked by SPARK-25029 <https://issues.apache.org/jira/browse/SPARK-25029> SPARK-4502 <https://issues.apache.org/jira/browse/SPARK-4502>: Spark SQL reads unnecessary nested fields from Parquet Being reviewed. SPARK-24882 <https://issues.apache.org/jira/browse/SPARK-24882>: data source v2 API improvement PR is out, being reviewed. SPARK-24252 <https://issues.apache.org/jira/browse/SPARK-24252>: Add catalog support in Data Source V2 Being reviewed. SPARK-24768 <https://issues.apache.org/jira/browse/SPARK-24768>: Have a built-in AVRO data source implementation It's close, just one last piece: the decimal type support SPARK-23243 <https://issues.apache.org/jira/browse/SPARK-23243>: Shuffle+Repartition on an RDD could lead to incorrect answers It turns out to be a very complicated issue, there is no consensus about what is the right fix yet. Likely to miss it in Spark 2.4 because it's a long-standing issue, not a regression. SPARK-24598 <https://issues.apache.org/jira/browse/SPARK-24598>: Datatype overflow conditions gives incorrect result We decided to keep the current behavior in Spark 2.4 and add some document(already done). We will re-consider this change in Spark 3.0. SPARK-24020 <https://issues.apache.org/jira/browse/SPARK-24020>: Sort-merge join inner range optimization There are some discussions about the design, I don't think we can get to a consensus within Spark 2.4. SPARK-24296 <https://issues.apache.org/jira/browse/SPARK-24296>: replicating large blocks over 2GB Being reviewed. SPARK-23874 <https://issues.apache.org/jira/browse/SPARK-23874>: upgrade to Apache Arrow 0.10.0 Apache Arrow 0.10.0 has some critical bug fixes and is being voted, we should wait a few days. According to the status, I think we should wait a few more days. Any objections? Thanks, Wenchen On Tue, Aug 7, 2018 at 3:39 AM Sean Owen <sro...@gmail.com> wrote: > ... and we still have a few snags with Scala 2.12 support at > https://issues.apache.org/jira/browse/SPARK-25029 > > There is some hope of resolving it on the order of a week, so for the > moment, seems worth holding 2.4 for. > > On Mon, Aug 6, 2018 at 2:37 PM Bryan Cutler <cutl...@gmail.com> wrote: > >> Hi All, >> >> I'd like to request a few days extension to the code freeze to complete >> the upgrade to Apache Arrow 0.10.0, SPARK-23874. This upgrade includes >> several key improvements and bug fixes. The RC vote just passed this >> morning and code changes are complete in >> https://github.com/apache/spark/pull/21939. We just need some time for >> the release artifacts to be available. Thoughts? >> >> Thanks, >> Bryan >> >