Thanks for the updated Max! Just a small clarification ‒ the following should be moved to RESOLVED:
1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib 2. SPARK-37395: Inline type hint files for files in python/pyspark/ml 3. SPARK-37093: Inline type hints python/pyspark/streaming On 4/28/22 14:42, Maxim Gekk wrote: > Hello All, > > I am going to create the first release candidate of Spark 3.3 at the > beginning of the next week if there are no objections. Below is the list > of allow features, and their current status. At the moment, only one > feature is still in progress, but it can be postponed to the next > release, I guess: > > IN PROGRESS: > > 1. SPARK-28516: Data Type Formatting Functions: `to_char` > > IN PROGRESS but won't/couldn't be merged to branch-3.3: > > 1. SPARK-37650: Tell spark-env.sh the python interpreter > 2. SPARK-36664: Log time spent waiting for cluster resources > 3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib > 4. SPARK-37395: Inline type hint files for files in python/pyspark/ml > 5. SPARK-37093: Inline type hints python/pyspark/streaming > > RESOLVED: > > 1. SPARK-32268: Bloom Filter Join > 2. SPARK-38548: New SQL function: try_sum > 3. SPARK-38063: Support SQL split_part function > 4. SPARK-38432: Refactor framework so as JDBC dialect could compile > filter by self way > 5. SPARK-34863: Support nested column in Spark Parquet vectorized readers > 6. SPARK-38194: Make Yarn memory overhead factor configurable > 7. SPARK-37618: Support cleaning up shuffle blocks from external > shuffle service > 8. SPARK-37831: Add task partition id in metrics > 9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and > DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support > 10. SPARK-38590: New SQL function: try_to_binary > 11. SPARK-37377: Refactor V2 Partitioning interface and remove > deprecated usage of Distribution > 12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based > sources > 13. SPARK-34659: Web UI does not correctly get appId > 14. SPARK-38589: New SQL function: try_avg > 15. SPARK-37691: Support ANSI Aggregation Function: percentile_disc > 16. SPARK-34079: Improvement CTE table scan > > > Max Gekk > > Software Engineer > > Databricks, Inc. > > > > On Fri, Apr 15, 2022 at 4:28 PM Maxim Gekk <maxim.g...@databricks.com > <mailto:maxim.g...@databricks.com>> wrote: > > Hello All, > > Current status of features from the allow list for branch-3.3 is: > > IN PROGRESS: > > 1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc > 2. SPARK-28516: Data Type Formatting Functions: `to_char` > 3. SPARK-34079: Improvement CTE table scan > > IN PROGRESS but won't/couldn't be merged to branch-3.3: > > 1. SPARK-37650: Tell spark-env.sh the python interpreter > 2. SPARK-36664: Log time spent waiting for cluster resources > 3. SPARK-37396: Inline type hint files for files in > python/pyspark/mllib > 4. SPARK-37395: Inline type hint files for files in python/pyspark/ml > 5. SPARK-37093: Inline type hints python/pyspark/streaming > > RESOLVED: > > 1. SPARK-32268: Bloom Filter Join > 2. SPARK-38548: New SQL function: try_sum > 3. SPARK-38063: Support SQL split_part function > 4. SPARK-38432: Refactor framework so as JDBC dialect could compile > filter by self way > 5. SPARK-34863: Support nested column in Spark Parquet vectorized > readers > 6. SPARK-38194: Make Yarn memory overhead factor configurable > 7. SPARK-37618: Support cleaning up shuffle blocks from external > shuffle service > 8. SPARK-37831: Add task partition id in metrics > 9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and > DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support > 10. SPARK-38590: New SQL function: try_to_binary > 11. SPARK-37377: Refactor V2 Partitioning interface and remove > deprecated usage of Distribution > 12. SPARK-38085: DataSource V2: Handle DELETE commands for > group-based sources > 13. SPARK-34659: Web UI does not correctly get appId > 14. SPARK-38589: New SQL function: try_avg > > > Max Gekk > > Software Engineer > > Databricks, Inc. > > > > On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk <maxim.g...@databricks.com > <mailto:maxim.g...@databricks.com>> wrote: > > Hello All, > > Below is current status of features from the allow list: > > IN PROGRESS: > > 1. SPARK-37396: Inline type hint files for files in > python/pyspark/mllib > 2. SPARK-37395: Inline type hint files for files in > python/pyspark/ml > 3. SPARK-37093: Inline type hints python/pyspark/streaming > 4. SPARK-37377: Refactor V2 Partitioning interface and remove > deprecated usage of Distribution > 5. SPARK-38085: DataSource V2: Handle DELETE commands for > group-based sources > 6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc > 7. SPARK-28516: Data Type Formatting Functions: `to_char` > 8. SPARK-36664: Log time spent waiting for cluster resources > 9. SPARK-34659: Web UI does not correctly get appId > 10. SPARK-37650: Tell spark-env.sh the python interpreter > 11. SPARK-38589: New SQL function: try_avg > 12. SPARK-38590: New SQL function: try_to_binary > 13. SPARK-34079: Improvement CTE table scan > > RESOLVED: > > 1. SPARK-32268: Bloom Filter Join > 2. SPARK-38548: New SQL function: try_sum > 3. SPARK-38063: Support SQL split_part function > 4. SPARK-38432: Refactor framework so as JDBC dialect could > compile filter by self way > 5. SPARK-34863: Support nested column in Spark Parquet > vectorized readers > 6. SPARK-38194: Make Yarn memory overhead factor configurable > 7. SPARK-37618: Support cleaning up shuffle blocks from > external shuffle service > 8. SPARK-37831: Add task partition id in metrics > 9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and > DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support > > We need to decide whether we are going to wait a little bit more > or close the doors. > > Maxim Gekk > > Software Engineer > > Databricks, Inc. > > > > On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk > <maxim.g...@databricks.com <mailto:maxim.g...@databricks.com>> > wrote: > > Hi All, > > Here is the allow list which I built based on your requests > in this thread: > > 1. SPARK-37396: Inline type hint files for files in > python/pyspark/mllib > 2. SPARK-37395: Inline type hint files for files in > python/pyspark/ml > 3. SPARK-37093: Inline type hints python/pyspark/streaming > 4. SPARK-37377: Refactor V2 Partitioning interface and > remove deprecated usage of Distribution > 5. SPARK-38085: DataSource V2: Handle DELETE commands for > group-based sources > 6. SPARK-32268: Bloom Filter Join > 7. SPARK-38548: New SQL function: try_sum > 8. SPARK-37691: Support ANSI Aggregation Function: > percentile_disc > 9. SPARK-38063: Support SQL split_part function > 10. SPARK-28516: Data Type Formatting Functions: `to_char` > 11. SPARK-38432: Refactor framework so as JDBC dialect could > compile filter by self way > 12. SPARK-34863: Support nested column in Spark Parquet > vectorized readers > 13. SPARK-38194: Make Yarn memory overhead factor configurable > 14. SPARK-37618: Support cleaning up shuffle blocks from > external shuffle service > 15. SPARK-37831: Add task partition id in metrics > 16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and > DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support > 17. SPARK-36664: Log time spent waiting for cluster resources > 18. SPARK-34659: Web UI does not correctly get appId > 19. SPARK-37650: Tell spark-env.sh the python interpreter > 20. SPARK-38589: New SQL function: try_avg > 21. SPARK-38590: New SQL function: try_to_binary > 22. SPARK-34079: Improvement CTE table scan > > Best regards, > Max Gekk > > > On Thu, Mar 17, 2022 at 4:59 PM Tom Graves > <tgraves...@yahoo.com <mailto:tgraves...@yahoo.com>> wrote: > > Is the feature freeze target date March 22nd then? I > saw a few dates thrown around want to confirm what we > landed on > > I am trying to get the following improvements finished > review and in, if concerns with either, let me know: > - [SPARK-34079][SQL] Merge non-correlated scalar > subqueries <https://github.com/apache/spark/pull/32298#> > - [SPARK-37618][CORE] Remove shuffle blocks using the > shuffle service for released executors > <https://github.com/apache/spark/pull/35085#> > > Tom > > > On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang > Wang <ltn...@gmail.com <mailto:ltn...@gmail.com>> wrote: > > > I'd like to add the following new SQL functions in the > 3.3 release. These functions are useful when overflow or > encoding errors occur: > > * [SPARK-38548][SQL] New SQL function: try_sum > <https://github.com/apache/spark/pull/35848> > * [SPARK-38589][SQL] New SQL function: try_avg > <https://github.com/apache/spark/pull/35896> > * [SPARK-38590][SQL] New SQL function: try_to_binary > <https://github.com/apache/spark/pull/35897> > > Gengliang > > On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo > <andrew.m...@gmail.com <mailto:andrew.m...@gmail.com>> > wrote: > > Hello, > > I've been trying for a bit to get the following two > PRs merged and > into a release, and I'm having some difficulty > moving them forward: > > https://github.com/apache/spark/pull/34903 > <https://github.com/apache/spark/pull/34903> - This > passes the current > python interpreter to spark-env.sh to allow some > currently-unavailable > customization to happen > https://github.com/apache/spark/pull/31774 > <https://github.com/apache/spark/pull/31774> - This > fixes a bug in the > SparkUI reverse proxy-handling code where it does a > greedy match for > "proxy" in the URL, and will mistakenly replace the > App-ID in the > wrong place. > > I'm not exactly sure of how to get attention of PRs > that have been > sitting around for a while, but these are really > important to our > use-cases, and it would be nice to have them merged in. > > Cheers > Andrew > > On Wed, Mar 16, 2022 at 6:21 PM Holden Karau > <hol...@pigscanfly.ca <mailto:hol...@pigscanfly.ca>> > wrote: > > > > I'd like to add/backport the logging in > https://github.com/apache/spark/pull/35881 > <https://github.com/apache/spark/pull/35881> PR so > that when users submit issues with dynamic > allocation we can better debug what's going on. > > > > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun > <sunc...@apache.org <mailto:sunc...@apache.org>> wrote: > >> > >> There is one item on our side that we want to > backport to 3.3: > >> - vectorized > DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for > >> Parquet V2 support > (https://github.com/apache/spark/pull/35262 > <https://github.com/apache/spark/pull/35262>) > >> > >> It's already reviewed and approved. > >> > >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves > <tgraves...@yahoo.com.invalid> wrote: > >> > > >> > It looks like the version hasn't been updated > on master and still shows 3.3.0-SNAPSHOT, can you > please update that. > >> > > >> > Tom > >> > > >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, > Maxim Gekk <maxim.g...@databricks.com > <mailto:maxim.g...@databricks.com>.invalid> wrote: > >> > > >> > > >> > Hi All, > >> > > >> > I have created the branch for Spark 3.3: > >> > > https://github.com/apache/spark/commits/branch-3.3 > <https://github.com/apache/spark/commits/branch-3.3> > >> > > >> > Please, backport important fixes to it, and if > you have some doubts, ping me in the PR. Regarding > new features, we are still building the allow list > for branch-3.3. > >> > > >> > Best regards, > >> > Max Gekk > >> > > >> > > >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun > <dongjoon.h...@gmail.com > <mailto:dongjoon.h...@gmail.com>> wrote: > >> > > >> > Yes, I agree with you for your whitelist > approach for backporting. :) > >> > Thank you for summarizing. > >> > > >> > Thanks, > >> > Dongjoon. > >> > > >> > > >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li > <gatorsm...@gmail.com <mailto:gatorsm...@gmail.com>> > wrote: > >> > > >> > I think I finally got your point. What you want > to keep unchanged is the branch cut date of Spark > 3.3. Today? or this Friday? This is not a big deal. > >> > > >> > My major concern is whether we should keep > merging the feature work or the dependency upgrade > after the branch cut. To make our release time more > predictable, I am suggesting we should finalize the > exception PR list first, instead of merging them in > an ad hoc way. In the past, we spent a lot of time > on the revert of the PRs that were merged after the > branch cut. I hope we can minimize unnecessary > arguments in this release. Do you agree, Dongjoon? > >> > > >> > > >> > > >> > Dongjoon Hyun <dongjoon.h...@gmail.com > <mailto:dongjoon.h...@gmail.com>> 于2022年3月15日周 > 二 15:55写道: > >> > > >> > That is not totally fine, Xiao. It sounds like > you are asking a change of plan without a proper reason. > >> > > >> > Although we cut the branch Today according our > plan, you still can collect the list and make a list > of exceptions. I'm not blocking what you want to do. > >> > > >> > Please let the community start to ramp down as > we agreed before. > >> > > >> > Dongjoon > >> > > >> > > >> > > >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li > <gatorsm...@gmail.com <mailto:gatorsm...@gmail.com>> > wrote: > >> > > >> > Please do not get me wrong. If we don't cut a > branch, we are allowing all patches to land Apache > Spark 3.3. That is totally fine. After we cut the > branch, we should avoid merging the feature work. In > the next three days, let us collect the actively > developed PRs that we want to make an exception > (i.e., merged to 3.3 after the upcoming branch cut). > Does that make sense? > >> > > >> > Dongjoon Hyun <dongjoon.h...@gmail.com > <mailto:dongjoon.h...@gmail.com>> 于2022年3月15日周 > 二 14:54写道: > >> > > >> > Xiao. You are working against what you are saying. > >> > If you don't cut a branch, it means you are > allowing all patches to land Apache Spark 3.3. No? > >> > > >> > > we need to avoid backporting the feature work > that are not being well discussed. > >> > > >> > > >> > > >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li > <gatorsm...@gmail.com <mailto:gatorsm...@gmail.com>> > wrote: > >> > > >> > Cutting the branch is simple, but we need to > avoid backporting the feature work that are not > being well discussed. Not all the members are > actively following the dev list. I think we should > wait 3 more days for collecting the PR list before > cutting the branch. > >> > > >> > BTW, there are very few 3.4-only feature work > that will be affected. > >> > > >> > Xiao > >> > > >> > Dongjoon Hyun <dongjoon.h...@gmail.com > <mailto:dongjoon.h...@gmail.com>> 于2022年3月15日周 > 二 11:49写道: > >> > > >> > Hi, Max, Chao, Xiao, Holden and all. > >> > > >> > I have a different idea. > >> > > >> > Given the situation and small patch list, I > don't think we need to postpone the branch cut for > those patches. It's easier to cut a branch-3.3 and > allow backporting. > >> > > >> > As of today, we already have an obvious Apache > Spark 3.4 patch in the branch together. This > situation only becomes worse and worse because there > is no way to block the other patches from landing > unintentionally if we don't cut a branch. > >> > > >> > [SPARK-38335][SQL] Implement parser support > for DEFAULT column values > >> > > >> > Let's cut `branch-3.3` Today for Apache Spark > 3.3.0 preparation. > >> > > >> > Best, > >> > Dongjoon. > >> > > >> > > >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun > <sunc...@apache.org <mailto:sunc...@apache.org>> wrote: > >> > > >> > Cool, thanks for clarifying! > >> > > >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li > <gatorsm...@gmail.com <mailto:gatorsm...@gmail.com>> > wrote: > >> > >> > >> > >> For the following list: > >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime > Filtering > >> > >> #34659 [SPARK-34863][SQL] Support complex > types for Parquet vectorized reader > >> > >> #35848 [SPARK-38548][SQL] New SQL function: > try_sum > >> > >> Do you mean we should include them, or > exclude them from 3.3? > >> > > > >> > > > >> > > If possible, I hope these features can be > shipped with Spark 3.3. > >> > > > >> > > > >> > > > >> > > Chao Sun <sunc...@apache.org > <mailto:sunc...@apache.org>> 于2022年3月15日周二 > 10:06写道: > >> > >> > >> > >> Hi Xiao, > >> > >> > >> > >> For the following list: > >> > >> > >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime > Filtering > >> > >> #34659 [SPARK-34863][SQL] Support complex > types for Parquet vectorized reader > >> > >> #35848 [SPARK-38548][SQL] New SQL function: > try_sum > >> > >> > >> > >> Do you mean we should include them, or > exclude them from 3.3? > >> > >> > >> > >> Thanks, > >> > >> Chao > >> > >> > >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon > Hyun <dongjoon.h...@gmail.com > <mailto:dongjoon.h...@gmail.com>> wrote: > >> > >> > > >> > >> > The following was tested and merged a few > minutes ago. So, we can remove it from the list. > >> > >> > > >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] > Bump Volcano to v1.5.1 > >> > >> > > >> > >> > Thanks, > >> > >> > Dongjoon. > >> > >> > > >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li > <gatorsm...@gmail.com <mailto:gatorsm...@gmail.com>> > wrote: > >> > >> >> > >> > >> >> Let me clarify my above suggestion. Maybe > we can wait 3 more days to collect the list of > actively developed PRs that we want to merge to 3.3 > after the branch cut? > >> > >> >> > >> > >> >> Please do not rush to merge the PRs that > are not fully reviewed. We can cut the branch this > Friday and continue merging the PRs that have been > discussed in this thread. Does that make sense? > >> > >> >> > >> > >> >> Xiao > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> Holden Karau <hol...@pigscanfly.ca > <mailto:hol...@pigscanfly.ca>> 于2022年3月15日周二 > 09:10写道: > >> > >> >>> > >> > >> >>> May I suggest we push out one week > (22nd) just to give everyone a bit of breathing > space? Rushed software development more often > results in bugs. > >> > >> >>> > >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun > Jiang <yikunk...@gmail.com > <mailto:yikunk...@gmail.com>> wrote: > >> > >> >>>> > >> > >> >>>> > To make our release time more > predictable, let us collect the PRs and wait three > more days before the branch cut? > >> > >> >>>> > >> > >> >>>> For SPIP: Support Customized Kubernetes > Schedulers: > >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] > Bump Volcano to v1.5.1 > >> > >> >>>> > >> > >> >>>> Three more days are OK for this from my > view. > >> > >> >>>> > >> > >> >>>> Regards, > >> > >> >>>> Yikun > >> > >> >>> > >> > >> >>> -- > >> > >> >>> Twitter: https://twitter.com/holdenkarau > <https://twitter.com/holdenkarau> > >> > >> >>> Books (Learning Spark, High Performance > Spark, etc.): https://amzn.to/2MaRAG9 > <https://amzn.to/2MaRAG9> > >> > >> >>> YouTube Live Streams: > https://www.youtube.com/user/holdenkarau > <https://www.youtube.com/user/holdenkarau> > > > > > > > > -- > > Twitter: https://twitter.com/holdenkarau > <https://twitter.com/holdenkarau> > > Books (Learning Spark, High Performance Spark, > etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > > YouTube Live Streams: > https://www.youtube.com/user/holdenkarau > <https://www.youtube.com/user/holdenkarau> > > > --------------------------------------------------------------------- > To unsubscribe e-mail: > dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@spark.apache.org> > -- Best regards, Maciej Szymkiewicz Web: https://zero323.net PGP: A30CEF0C31A501EC
OpenPGP_signature
Description: OpenPGP digital signature