Re: code freeze and branch cut for Apache Spark 2.4

Imran Rashid Tue, 31 Jul 2018 21:21:56 -0700

I'd like to add SPARK-24296, replicating large blocks over 2GB.  Its been
up for review for a while, and would end the 2GB block limit (well ...
subject to a couple of caveats on SPARK-6235).


On Mon, Jul 30, 2018 at 9:01 PM, Wenchen Fan <[email protected]> wrote:

> I went through the open JIRA tickets and here is a list that we should
> consider for Spark 2.4:
>
> *High Priority*:
> SPARK-24374 <https://issues.apache.org/jira/browse/SPARK-24374>: Support
> Barrier Execution Mode in Apache Spark
> This one is critical to the Spark ecosystem for deep learning. It only has
> a few remaining works and I think we should have it in Spark 2.4.
>
> *Middle Priority*:
> SPARK-23899 <https://issues.apache.org/jira/browse/SPARK-23899>: Built-in
> SQL Function Improvement
> We've already added a lot of built-in functions in this release, but there
> are a few useful higher-order functions in progress, like `array_except`,
> `transform`, etc. It would be great if we can get them in Spark 2.4.
>
> SPARK-14220 <https://issues.apache.org/jira/browse/SPARK-14220>: Build
> and test Spark against Scala 2.12
> Very close to finishing, great to have it in Spark 2.4.
>
> SPARK-4502 <https://issues.apache.org/jira/browse/SPARK-4502>: Spark SQL
> reads unnecessary nested fields from Parquet
> This one is there for years (thanks for your patience Michael!), and is
> also close to finishing. Great to have it in 2.4.
>
> SPARK-24882 <https://issues.apache.org/jira/browse/SPARK-24882>: data
> source v2 API improvement
> This is to improve the data source v2 API based on what we learned during
> this release. From the migration of existing sources and design of new
> features, we found some problems in the API and want to address them. I
> believe this should be the last significant API change to data source
> v2, so great to have in Spark 2.4. I'll send a discuss email about it later.
>
> SPARK-24252 <https://issues.apache.org/jira/browse/SPARK-24252>: Add
> catalog support in Data Source V2
> This is a very important feature for data source v2, and is currently
> being discussed in the dev list.
>
> SPARK-24768 <https://issues.apache.org/jira/browse/SPARK-24768>: Have a
> built-in AVRO data source implementation
> Most of it is done, but date/timestamp support is still missing. Great to
> have in 2.4.
>
> SPARK-23243 <https://issues.apache.org/jira/browse/SPARK-23243>:
> Shuffle+Repartition on an RDD could lead to incorrect answers
> This is a long-standing correctness bug, great to have in 2.4.
>
> There are some other important features like the adaptive execution,
> streaming SQL, etc., not in the list, since I think we are not able to
> finish them before 2.4.
>
> Feel free to add more things if you think they are important to Spark 2.4
> by replying to this email.
>
> Thanks,
> Wenchen
>
> On Mon, Jul 30, 2018 at 11:00 PM Sean Owen <[email protected]> wrote:
>
>> In theory releases happen on a time-based cadence, so it's pretty much
>> wrap up what's ready by the code freeze and ship it. In practice, the
>> cadence slips frequently, and it's very much a negotiation about what
>> features should push the code freeze out a few weeks every time. So, kind
>> of a hybrid approach here that works OK.
>>
>> Certainly speak up if you think there's something that really needs to
>> get into 2.4. This is that discuss thread.
>>
>> (BTW I updated the page you mention just yesterday, to reflect the plan
>> suggested in this thread.)
>>
>> On Mon, Jul 30, 2018 at 9:51 AM Tom Graves <[email protected]>
>> wrote:
>>
>>> Shouldn't this be a discuss thread?
>>>
>>> I'm also happy to see more release managers and agree the time is
>>> getting close, but we should see what features are in progress and see how
>>> close things are and propose a date based on that.  Cutting a branch to
>>> soon just creates more work for committers to push to more branches.
>>>
>>>  http://spark.apache.org/versioning-policy.html mentioned the code
>>> freeze and release branch cut mid-august.
>>>
>>>
>>> Tom
>>>
>>>

Re: code freeze and branch cut for Apache Spark 2.4

Reply via email to