Re: Apache Spark 3.3 Release

Wenchen Fan Sun, 20 Mar 2022 23:53:11 -0700

Just checked the release calendar, the planned RC cut date is April:
[image: image.png]
Let's revisit after 2 weeks then?


On Mon, Mar 21, 2022 at 2:47 PM Wenchen Fan <cloud0...@gmail.com> wrote:

> Shall we revisit this list after a week? Ideally, they should be either
> merged or rejected for 3.3, so that we can cut rc1. We can still discuss
> them case by case at that time if there are exceptions.
>
> On Sat, Mar 19, 2022 at 5:27 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
> wrote:
>
>> Thank you for your summarization.
>>
>> I believe we need to have a discussion in order to evaluate each PR's
>> readiness.
>>
>> BTW, `branch-3.3` is still open for bug fixes including minor dependency
>> changes like the following.
>>
>> (Backported)
>> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
>> Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
>> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5
>>
>> (Upcoming)
>> [SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
>> [SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0
>>
>> Dongjoon.
>>
>>
>>
>> On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk <maxim.g...@databricks.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Here is the allow list which I built based on your requests in this
>>> thread:
>>>
>>>    1. SPARK-37396: Inline type hint files for files in
>>>    python/pyspark/mllib
>>>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>>>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>>    deprecated usage of Distribution
>>>    5. SPARK-38085: DataSource V2: Handle DELETE commands for
>>>    group-based sources
>>>    6. SPARK-32268: Bloom Filter Join
>>>    7. SPARK-38548: New SQL function: try_sum
>>>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>>    9. SPARK-38063: Support SQL split_part function
>>>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>>    filter by self way
>>>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>>    readers
>>>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>>>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>>>    shuffle service
>>>    15. SPARK-37831: Add task partition id in metrics
>>>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>>    17. SPARK-36664: Log time spent waiting for cluster resources
>>>    18. SPARK-34659: Web UI does not correctly get appId
>>>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>>>    20. SPARK-38589: New SQL function: try_avg
>>>    21. SPARK-38590: New SQL function: try_to_binary
>>>    22. SPARK-34079: Improvement CTE table scan
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tgraves...@yahoo.com> wrote:
>>>
>>>> Is the feature freeze target date March 22nd then?  I saw a few dates
>>>> thrown around want to confirm what we landed on
>>>>
>>>> I am trying to get the following improvements finished review and in,
>>>> if concerns with either, let me know:
>>>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>>>> <https://github.com/apache/spark/pull/32298#>
>>>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>>>> for released executors <https://github.com/apache/spark/pull/35085#>
>>>>
>>>> Tom
>>>>
>>>>
>>>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>>>> ltn...@gmail.com> wrote:
>>>>
>>>>
>>>> I'd like to add the following new SQL functions in the 3.3 release.
>>>> These functions are useful when overflow or encoding errors occur:
>>>>
>>>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>>>    <https://github.com/apache/spark/pull/35848>
>>>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>>>    <https://github.com/apache/spark/pull/35896>
>>>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>>>    <https://github.com/apache/spark/pull/35897>
>>>>
>>>> Gengliang
>>>>
>>>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <andrew.m...@gmail.com>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I've been trying for a bit to get the following two PRs merged and
>>>> into a release, and I'm having some difficulty moving them forward:
>>>>
>>>> https://github.com/apache/spark/pull/34903 - This passes the current
>>>> python interpreter to spark-env.sh to allow some currently-unavailable
>>>> customization to happen
>>>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>>>> SparkUI reverse proxy-handling code where it does a greedy match for
>>>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>>>> wrong place.
>>>>
>>>> I'm not exactly sure of how to get attention of PRs that have been
>>>> sitting around for a while, but these are really important to our
>>>> use-cases, and it would be nice to have them merged in.
>>>>
>>>> Cheers
>>>> Andrew
>>>>
>>>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <hol...@pigscanfly.ca>
>>>> wrote:
>>>> >
>>>> > I'd like to add/backport the logging in
>>>> https://github.com/apache/spark/pull/35881 PR so that when users
>>>> submit issues with dynamic allocation we can better debug what's going on.
>>>> >
>>>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <sunc...@apache.org> wrote:
>>>> >>
>>>> >> There is one item on our side that we want to backport to 3.3:
>>>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>>>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>>> >>
>>>> >> It's already reviewed and approved.
>>>> >>
>>>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>>>> <tgraves...@yahoo.com.invalid> wrote:
>>>> >> >
>>>> >> > It looks like the version hasn't been updated on master and still
>>>> shows 3.3.0-SNAPSHOT, can you please update that.
>>>> >> >
>>>> >> > Tom
>>>> >> >
>>>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>>>> maxim.g...@databricks.com.invalid> wrote:
>>>> >> >
>>>> >> >
>>>> >> > Hi All,
>>>> >> >
>>>> >> > I have created the branch for Spark 3.3:
>>>> >> > https://github.com/apache/spark/commits/branch-3.3
>>>> >> >
>>>> >> > Please, backport important fixes to it, and if you have some
>>>> doubts, ping me in the PR. Regarding new features, we are still building
>>>> the allow list for branch-3.3.
>>>> >> >
>>>> >> > Best regards,
>>>> >> > Max Gekk
>>>> >> >
>>>> >> >
>>>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>>>> dongjoon.h...@gmail.com> wrote:
>>>> >> >
>>>> >> > Yes, I agree with you for your whitelist approach for backporting.
>>>> :)
>>>> >> > Thank you for summarizing.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <gatorsm...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > I think I finally got your point. What you want to keep unchanged
>>>> is the branch cut date of Spark 3.3. Today? or this Friday? This is not a
>>>> big deal.
>>>> >> >
>>>> >> > My major concern is whether we should keep merging the feature
>>>> work or the dependency upgrade after the branch cut. To make our release
>>>> time more predictable, I am suggesting we should finalize the exception PR
>>>> list first, instead of merging them in an ad hoc way. In the past, we spent
>>>> a lot of time on the revert of the PRs that were merged after the branch
>>>> cut. I hope we can minimize unnecessary arguments in this release. Do you
>>>> agree, Dongjoon?
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > Dongjoon Hyun <dongjoon.h...@gmail.com> 于2022年3月15日周二 15:55写道：
>>>> >> >
>>>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>>>> change of plan without a proper reason.
>>>> >> >
>>>> >> > Although we cut the branch Today according our plan, you still can
>>>> collect the list and make a list of exceptions. I'm not blocking what you
>>>> want to do.
>>>> >> >
>>>> >> > Please let the community start to ramp down as we agreed before.
>>>> >> >
>>>> >> > Dongjoon
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <gatorsm...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Please do not get me wrong. If we don't cut a branch, we are
>>>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>>>> we cut the branch, we should avoid merging the feature work. In the next
>>>> three days, let us collect the actively developed PRs that we want to make
>>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>>> make sense?
>>>> >> >
>>>> >> > Dongjoon Hyun <dongjoon.h...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>> >> >
>>>> >> > Xiao. You are working against what you are saying.
>>>> >> > If you don't cut a branch, it means you are allowing all patches
>>>> to land Apache Spark 3.3. No?
>>>> >> >
>>>> >> > > we need to avoid backporting the feature work that are not being
>>>> well discussed.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <gatorsm...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Cutting the branch is simple, but we need to avoid backporting the
>>>> feature work that are not being well discussed. Not all the members are
>>>> actively following the dev list. I think we should wait 3 more days for
>>>> collecting the PR list before cutting the branch.
>>>> >> >
>>>> >> > BTW, there are very few 3.4-only feature work that will be
>>>> affected.
>>>> >> >
>>>> >> > Xiao
>>>> >> >
>>>> >> > Dongjoon Hyun <dongjoon.h...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>> >> >
>>>> >> > Hi, Max, Chao, Xiao, Holden and all.
>>>> >> >
>>>> >> > I have a different idea.
>>>> >> >
>>>> >> > Given the situation and small patch list, I don't think we need to
>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>> and allow backporting.
>>>> >> >
>>>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>>>> the branch together. This situation only becomes worse and worse because
>>>> there is no way to block the other patches from landing unintentionally if
>>>> we don't cut a branch.
>>>> >> >
>>>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>> values
>>>> >> >
>>>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>> >> >
>>>> >> > Best,
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <sunc...@apache.org>
>>>> wrote:
>>>> >> >
>>>> >> > Cool, thanks for clarifying!
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <gatorsm...@gmail.com>
>>>> wrote:
>>>> >> > >>
>>>> >> > >> For the following list:
>>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>>> >> > >
>>>> >> > >
>>>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>>>> >> > >
>>>> >> > >
>>>> >> > >
>>>> >> > > Chao Sun <sunc...@apache.org> 于2022年3月15日周二 10:06写道：
>>>> >> > >>
>>>> >> > >> Hi Xiao,
>>>> >> > >>
>>>> >> > >> For the following list:
>>>> >> > >>
>>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >> > >>
>>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>>> >> > >>
>>>> >> > >> Thanks,
>>>> >> > >> Chao
>>>> >> > >>
>>>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>> dongjoon.h...@gmail.com> wrote:
>>>> >> > >> >
>>>> >> > >> > The following was tested and merged a few minutes ago. So, we
>>>> can remove it from the list.
>>>> >> > >> >
>>>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>> >> > >> >
>>>> >> > >> > Thanks,
>>>> >> > >> > Dongjoon.
>>>> >> > >> >
>>>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <gatorsm...@gmail.com>
>>>> wrote:
>>>> >> > >> >>
>>>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>>> days to collect the list of actively developed PRs that we want to merge to
>>>> 3.3 after the branch cut?
>>>> >> > >> >>
>>>> >> > >> >> Please do not rush to merge the PRs that are not fully
>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>> that have been discussed in this thread. Does that make sense?
>>>> >> > >> >>
>>>> >> > >> >> Xiao
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >> Holden Karau <hol...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>> >> > >> >>>
>>>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>>>> everyone a bit of breathing space? Rushed software development more often
>>>> results in bugs.
>>>> >> > >> >>>
>>>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>> yikunk...@gmail.com> wrote:
>>>> >> > >> >>>>
>>>> >> > >> >>>> > To make our release time more predictable, let us
>>>> collect the PRs and wait three more days before the branch cut?
>>>> >> > >> >>>>
>>>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>> v1.5.1
>>>> >> > >> >>>>
>>>> >> > >> >>>> Three more days are OK for this from my view.
>>>> >> > >> >>>>
>>>> >> > >> >>>> Regards,
>>>> >> > >> >>>> Yikun
>>>> >> > >> >>>
>>>> >> > >> >>> --
>>>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>>>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> >> > >> >>> YouTube Live Streams:
>>>> https://www.youtube.com/user/holdenkarau
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Twitter: https://twitter.com/holdenkarau
>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>

Re: Apache Spark 3.3 Release

Reply via email to