Re: Apache Spark 3.3 Release

Tom Graves Mon, 21 Mar 2022 06:46:12 -0700

 Maybe I'm miss understanding what you are saying, according to those dates 
code freeze, which should be majority of features are merged is March 15th. So 
if this list is all features and not merged at this point we should probably 
discuss if we want them to go in or if we need to change the dates.  Major 
features going in during QA period can destabilize things.
Tom
    On Monday, March 21, 2022, 01:53:24 AM CDT, Wenchen Fan 
<[email protected]> wrote:  
 
 Just checked the release calendar, the planned RC cut date is April:
Let's revisit after 2 weeks then?
On Mon, Mar 21, 2022 at 2:47 PM Wenchen Fan <[email protected]> wrote:


Shall we revisit this list after a week? Ideally, they should be either merged 
or rejected for 3.3, so that we can cut rc1. We can still discuss them case by 
case at that time if there are exceptions.
On Sat, Mar 19, 2022 at 5:27 AM Dongjoon Hyun <[email protected]> wrote:

Thank you for your summarization.

I believe we need to have a discussion in order to evaluate each PR's readiness.

BTW, `branch-3.3` is still open for bug fixes including minor dependency 
changes like the following.

(Backported)[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5

(Upcoming)
[SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
[SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0
Dongjoon.


On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk <[email protected]> wrote:

Hi All,
Here is the allow list which I built based on your requests in this thread:   
   - SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   - SPARK-37395: Inline type hint files for files in python/pyspark/ml
   - SPARK-37093: Inline type hints python/pyspark/streaming
   - SPARK-37377: Refactor V2 Partitioning interface and remove deprecated 
usage of Distribution
   - SPARK-38085: DataSource V2: Handle DELETE commands for group-based sources
   - SPARK-32268: Bloom Filter Join
   - SPARK-38548: New SQL function: try_sum
   - SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   - SPARK-38063: Support SQL split_part function
   - SPARK-28516: Data Type Formatting Functions: `to_char`
   - SPARK-38432: Refactor framework so as JDBC dialect could compile filter by 
self way
   - SPARK-34863: Support nested column in Spark Parquet vectorized readers
   - SPARK-38194: Make Yarn memory overhead factor configurable
   - SPARK-37618: Support cleaning up shuffle blocks from external shuffle 
service
   - SPARK-37831: Add task partition id in metrics
   - SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and 
DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   - SPARK-36664: Log time spent waiting for cluster resources
   - SPARK-34659: Web UI does not correctly get appId
   - SPARK-37650: Tell spark-env.sh the python interpreter
   - SPARK-38589: New SQL function: try_avg
   - SPARK-38590: New SQL function: try_to_binary   

   - SPARK-34079: Improvement CTE table scan   

Best regards,Max Gekk

On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <[email protected]> wrote:

 Is the feature freeze target date March 22nd then?  I saw a few dates thrown 
around want to confirm what we landed on 
I am trying to get the following improvements finished review and in, if 
concerns with either, let me know:- [SPARK-34079][SQL] Merge non-correlated 
scalar subqueries- [SPARK-37618][CORE] Remove shuffle blocks using the shuffle 
service for released executors
Tom

    On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang 
<[email protected]> wrote:  
 
 I'd like to add the following new SQL functions in the 3.3 release. These 
functions are useful when overflow or encoding errors occur:   
   - [SPARK-38548][SQL] New SQL function: try_sum    

   - [SPARK-38589][SQL] New SQL function: try_avg   

   - [SPARK-38590][SQL] New SQL function: try_to_binary    

Gengliang
On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <[email protected]> wrote:

Hello,

I've been trying for a bit to get the following two PRs merged and
into a release, and I'm having some difficulty moving them forward:

https://github.com/apache/spark/pull/34903 - This passes the current
python interpreter to spark-env.sh to allow some currently-unavailable
customization to happen
https://github.com/apache/spark/pull/31774 - This fixes a bug in the
SparkUI reverse proxy-handling code where it does a greedy match for
"proxy" in the URL, and will mistakenly replace the App-ID in the
wrong place.

I'm not exactly sure of how to get attention of PRs that have been
sitting around for a while, but these are really important to our
use-cases, and it would be nice to have them merged in.

Cheers
Andrew

On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <[email protected]> wrote:
>
> I'd like to add/backport the logging in 
> https://github.com/apache/spark/pull/35881 PR so that when users submit 
> issues with dynamic allocation we can better debug what's going on.
>
> On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <[email protected]> wrote:
>>
>> There is one item on our side that we want to backport to 3.3:
>> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>
>> It's already reviewed and approved.
>>
>> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves <[email protected]> 
>> wrote:
>> >
>> > It looks like the version hasn't been updated on master and still shows 
>> > 3.3.0-SNAPSHOT, can you please update that.
>> >
>> > Tom
>> >
>> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk 
>> > <[email protected]> wrote:
>> >
>> >
>> > Hi All,
>> >
>> > I have created the branch for Spark 3.3:
>> > https://github.com/apache/spark/commits/branch-3.3
>> >
>> > Please, backport important fixes to it, and if you have some doubts, ping 
>> > me in the PR. Regarding new features, we are still building the allow list 
>> > for branch-3.3.
>> >
>> > Best regards,
>> > Max Gekk
>> >
>> >
>> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <[email protected]> 
>> > wrote:
>> >
>> > Yes, I agree with you for your whitelist approach for backporting. :)
>> > Thank you for summarizing.
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <[email protected]> wrote:
>> >
>> > I think I finally got your point. What you want to keep unchanged is the 
>> > branch cut date of Spark 3.3. Today? or this Friday? This is not a big 
>> > deal.
>> >
>> > My major concern is whether we should keep merging the feature work or the 
>> > dependency upgrade after the branch cut. To make our release time more 
>> > predictable, I am suggesting we should finalize the exception PR list 
>> > first, instead of merging them in an ad hoc way. In the past, we spent a 
>> > lot of time on the revert of the PRs that were merged after the branch 
>> > cut. I hope we can minimize unnecessary arguments in this release. Do you 
>> > agree, Dongjoon?
>> >
>> >
>> >
>> > Dongjoon Hyun <[email protected]> 于2022年3月15日周二 15:55写道：
>> >
>> > That is not totally fine, Xiao. It sounds like you are asking a change of 
>> > plan without a proper reason.
>> >
>> > Although we cut the branch Today according our plan, you still can collect 
>> > the list and make a list of exceptions. I'm not blocking what you want to 
>> > do.
>> >
>> > Please let the community start to ramp down as we agreed before.
>> >
>> > Dongjoon
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <[email protected]> wrote:
>> >
>> > Please do not get me wrong. If we don't cut a branch, we are allowing all 
>> > patches to land Apache Spark 3.3. That is totally fine. After we cut the 
>> > branch, we should avoid merging the feature work. In the next three days, 
>> > let us collect the actively developed PRs that we want to make an 
>> > exception (i.e., merged to 3.3 after the upcoming branch cut). Does that 
>> > make sense?
>> >
>> > Dongjoon Hyun <[email protected]> 于2022年3月15日周二 14:54写道：
>> >
>> > Xiao. You are working against what you are saying.
>> > If you don't cut a branch, it means you are allowing all patches to land 
>> > Apache Spark 3.3. No?
>> >
>> > > we need to avoid backporting the feature work that are not being well 
>> > > discussed.
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <[email protected]> wrote:
>> >
>> > Cutting the branch is simple, but we need to avoid backporting the feature 
>> > work that are not being well discussed. Not all the members are actively 
>> > following the dev list. I think we should wait 3 more days for collecting 
>> > the PR list before cutting the branch.
>> >
>> > BTW, there are very few 3.4-only feature work that will be affected.
>> >
>> > Xiao
>> >
>> > Dongjoon Hyun <[email protected]> 于2022年3月15日周二 11:49写道：
>> >
>> > Hi, Max, Chao, Xiao, Holden and all.
>> >
>> > I have a different idea.
>> >
>> > Given the situation and small patch list, I don't think we need to 
>> > postpone the branch cut for those patches. It's easier to cut a branch-3.3 
>> > and allow backporting.
>> >
>> > As of today, we already have an obvious Apache Spark 3.4 patch in the 
>> > branch together. This situation only becomes worse and worse because there 
>> > is no way to block the other patches from landing unintentionally if we 
>> > don't cut a branch.
>> >
>> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>> >
>> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>> >
>> > Best,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <[email protected]> wrote:
>> >
>> > Cool, thanks for clarifying!
>> >
>> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <[email protected]> wrote:
>> > >>
>> > >> For the following list:
>> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized 
>> > >> reader
>> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> > >> Do you mean we should include them, or exclude them from 3.3?
>> > >
>> > >
>> > > If possible, I hope these features can be shipped with Spark 3.3.
>> > >
>> > >
>> > >
>> > > Chao Sun <[email protected]> 于2022年3月15日周二 10:06写道：
>> > >>
>> > >> Hi Xiao,
>> > >>
>> > >> For the following list:
>> > >>
>> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized 
>> > >> reader
>> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> > >>
>> > >> Do you mean we should include them, or exclude them from 3.3?
>> > >>
>> > >> Thanks,
>> > >> Chao
>> > >>
>> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <[email protected]> 
>> > >> wrote:
>> > >> >
>> > >> > The following was tested and merged a few minutes ago. So, we can 
>> > >> > remove it from the list.
>> > >> >
>> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> > >> >
>> > >> > Thanks,
>> > >> > Dongjoon.
>> > >> >
>> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <[email protected]> wrote:
>> > >> >>
>> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to 
>> > >> >> collect the list of actively developed PRs that we want to merge to 
>> > >> >> 3.3 after the branch cut?
>> > >> >>
>> > >> >> Please do not rush to merge the PRs that are not fully reviewed. We 
>> > >> >> can cut the branch this Friday and continue merging the PRs that 
>> > >> >> have been discussed in this thread. Does that make sense?
>> > >> >>
>> > >> >> Xiao
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> Holden Karau <[email protected]> 于2022年3月15日周二 09:10写道：
>> > >> >>>
>> > >> >>> May I suggest we push out one week (22nd) just to give everyone a 
>> > >> >>> bit of breathing space? Rushed software development more often 
>> > >> >>> results in bugs.
>> > >> >>>
>> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <[email protected]> 
>> > >> >>> wrote:
>> > >> >>>>
>> > >> >>>> > To make our release time more predictable, let us collect the 
>> > >> >>>> > PRs and wait three more days before the branch cut?
>> > >> >>>>
>> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> > >> >>>>
>> > >> >>>> Three more days are OK for this from my view.
>> > >> >>>>
>> > >> >>>> Regards,
>> > >> >>>> Yikun
>> > >> >>>
>> > >> >>> --
>> > >> >>> Twitter: https://twitter.com/holdenkarau
>> > >> >>> Books (Learning Spark, High Performance Spark, etc.): 
>> > >> >>> https://amzn.to/2MaRAG9
>> > >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Apache Spark 3.3 Release

Reply via email to