Re: code freeze and branch cut for Apache Spark 2.4

Wenchen Fan Tue, 07 Aug 2018 06:39:45 -0700

Some updates for the JIRA tickets that we want to resolve before Spark 2.4.

green: merged
orange: in progress
red: likely to miss

SPARK-24374 <https://issues.apache.org/jira/browse/SPARK-24374>: Support
Barrier Execution Mode in Apache Spark
The core functionality is finished, but we still need to add Python API.
Tracked by SPARK-24822 <https://issues.apache.org/jira/browse/SPARK-24822>

SPARK-23899 <https://issues.apache.org/jira/browse/SPARK-23899>: Built-in
SQL Function Improvement
I think it's ready to go. Although there are still some functions working
in progress, the common ones are all merged.

SPARK-14220 <https://issues.apache.org/jira/browse/SPARK-14220>: Build and
test Spark against Scala 2.12
It's close, just one last piece. Tracked by SPARK-25029
<https://issues.apache.org/jira/browse/SPARK-25029>

SPARK-4502 <https://issues.apache.org/jira/browse/SPARK-4502>: Spark SQL
reads unnecessary nested fields from Parquet
Being reviewed.

SPARK-24882 <https://issues.apache.org/jira/browse/SPARK-24882>: data
source v2 API improvement
PR is out, being reviewed.

SPARK-24252 <https://issues.apache.org/jira/browse/SPARK-24252>: Add
catalog support in Data Source V2
Being reviewed.

SPARK-24768 <https://issues.apache.org/jira/browse/SPARK-24768>: Have a
built-in AVRO data source implementation
It's close, just one last piece: the decimal type support

SPARK-23243 <https://issues.apache.org/jira/browse/SPARK-23243>:
Shuffle+Repartition
on an RDD could lead to incorrect answers
It turns out to be a very complicated issue, there is no consensus about
what is the right fix yet. Likely to miss it in Spark 2.4 because it's a
long-standing issue, not a regression.

SPARK-24598 <https://issues.apache.org/jira/browse/SPARK-24598>: Datatype
overflow conditions gives incorrect result
We decided to keep the current behavior in Spark 2.4 and add some
document(already done). We will re-consider this change in Spark 3.0.

SPARK-24020 <https://issues.apache.org/jira/browse/SPARK-24020>: Sort-merge
join inner range optimization
There are some discussions about the design, I don't think we can get to a
consensus within Spark 2.4.

SPARK-24296 <https://issues.apache.org/jira/browse/SPARK-24296>: replicating
large blocks over 2GB
Being reviewed.

SPARK-23874 <https://issues.apache.org/jira/browse/SPARK-23874>: upgrade to
Apache Arrow 0.10.0
Apache Arrow 0.10.0 has some critical bug fixes and is being voted, we
should wait a few days.

According to the status, I think we should wait a few more days. Any
objections?

Thanks,
Wenchen

On Tue, Aug 7, 2018 at 3:39 AM Sean Owen <sro...@gmail.com> wrote:

> ... and we still have a few snags with Scala 2.12 support at
> https://issues.apache.org/jira/browse/SPARK-25029
>
> There is some hope of resolving it on the order of a week, so for the
> moment, seems worth holding 2.4 for.
>
> On Mon, Aug 6, 2018 at 2:37 PM Bryan Cutler <cutl...@gmail.com> wrote:
>
>> Hi All,
>>
>> I'd like to request a few days extension to the code freeze to complete
>> the upgrade to Apache Arrow 0.10.0, SPARK-23874. This upgrade includes
>> several key improvements and bug fixes.  The RC vote just passed this
>> morning and code changes are complete in
>> https://github.com/apache/spark/pull/21939. We just need some time for
>> the release artifacts to be available. Thoughts?
>>
>> Thanks,
>> Bryan
>>
>

Re: code freeze and branch cut for Apache Spark 2.4

Reply via email to