PR builder broken

2023-05-10 Thread Xingbo Jiang
Hi dev, I've seen multiple PR builder failures like below since this morning: ``` TypeError: Cannot read properties of undefined (reading 'head_sha') at eval (eval at callAsyncFunction (/home/runner/work/_actions/actions/github-script/v6/dist/index.js:15143:16), :81:22) Error: Unhandled

Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Xingbo Jiang
+1 On Wed, Nov 30, 2022 at 5:59 PM Jungtaek Lim wrote: > Starting with +1 from me. > > On Thu, Dec 1, 2022 at 10:54 AM Jungtaek Lim > wrote: > >> Hi all, >> >> I'd like to start the vote for SPIP: Asynchronous Offset Management in >> Structured Streaming. >> >> The high level summary of the

Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Xingbo Jiang
Congratulations! Yuanjian Li 于2022年8月9日 周二20:31写道: > Congratulations, Xinrong! > > XiDuo You 于2022年8月9日 周二19:18写道: > >> Congratulations! >> >> Haejoon Lee 于2022年8月10日周三 09:30写道: >> > >> > Congrats, Xinrong!! >> > >> > On Tue, Aug 9, 2022 at 5:12 PM Hyukjin Kwon >> wrote: >> >> >> >> Hi all, >>

Re: SIGMOD System Award for Apache Spark

2022-05-13 Thread Xingbo Jiang
Congratulations! On Fri, May 13, 2022 at 9:43 AM Xiao Li wrote: > Congratulations to everyone! > > Xiao > > On Fri, May 13, 2022 at 9:34 AM Dongjoon Hyun > wrote: > >> Ya, it's really great!. Congratulations to the whole community! >> >> Dongjoon. >> >> On Fri, May 13, 2022 at 8:12 AM Chao Sun

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread Xingbo Jiang
+1 This is an exciting new feature! On Sun, Sep 13, 2020 at 8:00 PM Mridul Muralidharan wrote: > Hi, > > I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based > shuffle to improve shuffle efficiency. > Please take a look at: > >- SPIP jira:

Re: [OSS DIGEST] The major changes of Apache Spark from June 3 to June 16

2020-07-21 Thread Xingbo Jiang
des are > being shutdown & > > [SPARK-21040][CORE] Speculate tasks which are running on decommission > executors two of the PRs merged after the decommissioning SPIP. > > On Tue, Jul 21, 2020 at 10:53 AM Xingbo Jiang > wrote: > >> Hi all, >> >> This is the bi-w

[OSS DIGEST] The major changes of Apache Spark from June 3 to June 16

2020-07-21 Thread Xingbo Jiang
Hi all, This is the bi-weekly Apache Spark digest from the Databricks OSS team. For each API/configuration/behavior change, an *[API] *tag is added in the title. CORE

Re: Welcoming some new Apache Spark committers

2020-07-14 Thread Xingbo Jiang
Welcome, Huaxin, Jungtaek, and Dilip! Congratulations! On Tue, Jul 14, 2020 at 10:37 AM Matei Zaharia wrote: > Hi all, > > The Spark PMC recently voted to add several new committers. Please join me > in welcoming them to their new roles! The new committers are: > > - Huaxin Gao > - Jungtaek

Re: [vote] Apache Spark 3.0 RC3

2020-06-08 Thread Xingbo Jiang
+1(non-binding) Jiaxin Shan 于2020年6月8日 周一下午9:50写道: > +1 > I build binary using the following command, test spark workloads on > Kubernetes (AWS EKS) and it's working well. > > ./dev/make-distribution.sh --name spark-v3.0.0-rc3-20200608 --tgz > -Phadoop-3.2 -Pkubernetes -Phive -Phive-thriftserver

Re: [OSS DIGEST] The major changes of Apache Spark from Mar 25 to Apr 7

2020-04-29 Thread Xingbo Jiang
Thank you so much for doing this, Xiao! On Wed, Apr 29, 2020 at 11:09 AM Xiao Li wrote: > Hi all, > > This is the bi-weekly Apache Spark digest from the Databricks OSS team. > For each API/configuration/behavior change, an *[API] *tag is added in > the title. > > CORE >

Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-13 Thread Xingbo Jiang
Hello, > > > > On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang > wrote: > >> > >> Hi all, > >> > >> Based on my experience, there is no scenario that necessarily requires > deploying multiple Workers on the same node with Standalone backend. A >

Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-12 Thread Xingbo Jiang
ice so > far. > > Why do we consider it as a special case for scheduling, where two workers > are on the same node than two different nodes? Possibly, optimize on > network I/o and disk I/O? > > On Tue, Mar 3, 2020 at 12:45 AM Xingbo Jiang > wrote: > >> Thanks Sean for

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Xingbo Jiang
+1 (non-binding) Cheers, Xingbo On Mon, Mar 9, 2020 at 9:35 AM Xiao Li wrote: > +1 (binding) > > Xiao > > On Mon, Mar 9, 2020 at 8:33 AM Denny Lee wrote: > >> +1 (non-binding) >> >> On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon wrote: >> >>> The proposal itself seems good as the factors to

Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-03-02 Thread Xingbo Jiang
practice enough > to enforce 1 worker per machine, if it makes things simpler rather > than harder. > > On Fri, Feb 28, 2020 at 1:21 PM Xingbo Jiang > wrote: > > > > Hi all, > > > > Based on my experience, there is no scenario that necessarily requires > de

[DISCUSS] Remove multiple workers on the same host support from Standalone backend

2020-02-28 Thread Xingbo Jiang
Hi all, Based on my experience, there is no scenario that necessarily requires deploying multiple Workers on the same node with Standalone backend. A worker should book all the resources reserved to Spark on the host it is launched, then it can allocate those resources to one or more executors

Re: spark-3.0.0-preview release notes link is broken

2019-11-28 Thread Xingbo Jiang
Hi Sandeep, Thanks for reporting! spark-3.0.0-preview is not a stable release, so we should not include this version in the `Release Notes for Stable Releases` section. I've submitted a PR ( https://github.com/apache/spark-website/pull/235) to fix the issue. Cheers, Xingbo On Thu, Nov 28, 2019

[ANNOUNCE] Announcing Apache Spark 3.0.0-preview

2019-11-07 Thread Xingbo Jiang
Hi all, To enable wide-scale community testing of the upcoming Spark 3.0 release, the Apache Spark community has posted a preview release of Spark 3.0. This preview is *not a stable release in terms of either API or functionality*, but it is meant to give the community early access to try the

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-11-04 Thread Xingbo Jiang
;>>>> On Thu, Oct 31, 2019 at 3:54 PM Sean Owen wrote: >>>>> >>>>>> This isn't a big thing, but I see that the pyspark build includes >>>>>> Hadoop 2.7 rather than 3.2. Maybe later we change the build to put in >>>>>&

[VOTE] SPARK 3.0.0-preview (RC2)

2019-10-31 Thread Xingbo Jiang
Please vote on releasing the following candidate as Apache Spark version 3.0.0-preview. The vote is open until November 3 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.0.0-preview [ ] -1 Do not release this package

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-30 Thread Xingbo Jiang
; > On Wed, Oct 30, 2019 at 4:15 PM Xingbo Jiang > wrote: > >> Hi all, >> >> This RC fails because: >> It fails to generate a PySpark release. >> >> I'll start RC2 soon. >> >> Thanks! >> >> Xingbo >> >> >> On We

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-30 Thread Xingbo Jiang
Hi all, This RC fails because: It fails to generate a PySpark release. I'll start RC2 soon. Thanks! Xingbo On Wed, Oct 30, 2019 at 4:10 PM Xingbo Jiang wrote: > Thanks Sean, since we need to generate PySpark release with a different > name, I would prefer fail RC1 and start another r

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-30 Thread Xingbo Jiang
me tag, we can evaluate it > within this same release candidate. Otherwise, just roll another > release candidate. > > I was able to build it and pass all tests with JDK 8 and JDK 11 > (hadoop-3.2 profile, note) on Ubuntu, so this is otherwise looking > good to me. > > On Tue,

Re: Packages to release in 3.0.0-preview

2019-10-30 Thread Xingbo Jiang
scala 2.13 support is tracked by https://issues.apache.org/jira/browse/SPARK-25075 , at the current time there are still major issues remaining, thus we don't include scala 2.13 support in the 3.0.0-preview release. If the task is finished before the code freeze of Spark 3.0.0, then it's still

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-29 Thread Xingbo Jiang
iew.tgz.sha512 > > > Bests, > Dongjoon. > > > On Tue, Oct 29, 2019 at 7:18 PM Xingbo Jiang > wrote: > >> Thanks for the correction, we shall remove the statement >>> >>> Everything else please retarget to an appropriate release. >&

Re: [VOTE] SPARK 3.0.0-preview (RC1)

2019-10-29 Thread Xingbo Jiang
ue, Oct 29, 2019 at 7:01 PM Xingbo Jiang > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 3.0.0-preview. >> >> The vote is open until November 2 PST and passes if a majority +1 PMC >> votes are cast, with >> a min

[VOTE] SPARK 3.0.0-preview (RC1)

2019-10-29 Thread Xingbo Jiang
Please vote on releasing the following candidate as Apache Spark version 3.0.0-preview. The vote is open until November 2 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.0.0-preview [ ] -1 Do not release this package

Packages to release in 3.0.0-preview

2019-10-25 Thread Xingbo Jiang
Hi all, I would like to bring out a discussion on how many packages shall be released in 3.0.0-preview, the ones I can think of now: * scala 2.12 + hadoop 2.7 * scala 2.12 + hadoop 3.2 * scala 2.12 + hadoop 3.2 + JDK 11 Do you have other combinations to add to the above list? Cheers, Xingbo

Unable to resolve dependency of sbt-mima-plugin since yesterday

2019-10-22 Thread Xingbo Jiang
Hi, Do you have any idea why the `./dev/lint-scala` check are failure with the following message since yesterday ? WARNING: An illegal reflective access operation has occurred > 9 > WARNING: > Illegal reflective

Re: [PMCs] Any project news or announcements this week?

2019-10-22 Thread Xingbo Jiang
I'm working with Wenchen to make a release candidate, but haven't been successful with our release script. I think we shall be able to make a Spark 3.0 preview release this week, though. cc @Sally Cheers, Xingbo Sean Owen 于2019年10月20日周日 下午11:04写道: > I wonder if we are likely to have a Spark

Re: branch-3.0 vs branch-3.0-preview (?)

2019-10-17 Thread Xingbo Jiang
Oct 17, 2019 at 12:28 AM Xingbo Jiang > wrote: > > > > How about add `3.0.0-preview` tag on master branch, and claim that for > the preview release, we won't consider bugs introduced by new features > merged into master after the first preview RC ? This could rule out the >

Re: branch-3.0 vs branch-3.0-preview (?)

2019-10-17 Thread Xingbo Jiang
ply that master is 3.1. We don't want to merge every new change into > >> two branches. > >> It may still be useful to have `branch-3.0-preview` as a short-lived > >> branch just used to manage the preview release, as we will need to let > >> development on 3.0 in

Re: branch-3.0 vs branch-3.0-preview (?)

2019-10-16 Thread Xingbo Jiang
Hi Dongjoon, I'm not sure about the best practice of maintaining a preview release branch, since new features might still go into Spark 3.0 after preview release, I guess it might make more sense to have separated branches for 3.0.0 and 3.0-preview. However, I'm open to both solutions, if we

Re: Spark 3.0 preview release feature list and major changes

2019-10-10 Thread Xingbo Jiang
Hi all, Here is the updated feature list: SPARK-11215 Multiple columns support added to various Transformers: StringIndexer SPARK-11150 Implement Dynamic Partition Pruning SPARK-13677

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Xingbo Jiang
In particular, I am interested in > https://issues.apache.org/jira/browse/SPARK-28006. I am happy to do the > work so want to make sure I don't miss the "cut" date. > > On Tue, Oct 8, 2019 at 4:53 PM Xingbo Jiang wrote: > >> Hi all, >> >> Thanks for all

Re: Spark 3.0 preview release feature list and major changes

2019-10-08 Thread Xingbo Jiang
Kafka source: offset by timestamp (starting/ending) >> SPARK-28074 <https://issues.apache.org/jira/browse/SPARK-28074> Log warn >> message on possible correctness issue for multiple stateful operations in >> single query >> >> and core side, >> >&g

Spark 3.0 preview release feature list and major changes

2019-10-07 Thread Xingbo Jiang
Hi all, I went over all the finished JIRA tickets targeted to Spark 3.0.0, here I'm listing all the notable features and major changes that are ready to test/deliver, please don't hesitate to add more to the list: SPARK-11215 Multiple columns

Re: Spark 3.0 preview release on-going features discussion

2019-09-23 Thread Xingbo Jiang
;>> between now and then, the better. >>> I wouldn't necessarily present any list of things that will or might >>> be in 3.0 with that preview; just list the things that are done, like >>> JDK 11 support. >>> >>> On Fri, Sep 20, 2019 at 2:46

Spark 3.0 preview release on-going features discussion

2019-09-20 Thread Xingbo Jiang
Hi all, Let's start a new thread to discuss the on-going features for Spark 3.0 preview release. Below is the feature list for the Spark 3.0 preview release. The list is collected from the previous discussions in the dev list. - Followup of the shuffle+repartition correctness issue: support

Re: Thoughts on Spark 3 release, or a preview release

2019-09-13 Thread Xingbo Jiang
Hi all, I would like to volunteer to be the release manager of Spark 3 preview, thanks! Sean Owen 于2019年9月13日周五 上午11:21写道: > Well, great to hear the unanimous support for a Spark 3 preview > release. Now, I don't know how to make releases myself :) I would > first open it up to our revered

Re: Welcoming some new committers and PMC members

2019-09-09 Thread Xingbo Jiang
Congratulations! Wenchen Fan 于2019年9月9日 周一下午7:49写道: > Congratulations! > > On Tue, Sep 10, 2019 at 10:19 AM Yuanjian Li > wrote: > >> Congratulations! >> >> sujith chacko 于2019年9月10日周二 上午10:15写道: >> >>> Congratulations all. >>> >>> On Tue, 10 Sep 2019 at 7:27 AM, Haibo wrote: >>>

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xingbo Jiang
+1 on the updated SPIP Xingbo Jiang 于2019年3月26日周二 下午1:32写道: > Hi all, > > Now we have had a few discussions over the updated SPIP, we also updated > the SPIP addressing new feedbacks from some committers. IMO the SPIP is > ready for another round of vote now. > On the updated

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-25 Thread Xingbo Jiang
e >>>>>>> could change this to a warning if everyone thinks that is better but I >>>>>>> personally like the error until we can implement the per lower level per >>>>>>> stage configuration. >>>>>>> >>>>&

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-18 Thread Xingbo Jiang
Hi all, I updated the SPIP doc and stories , I hope it now contains clear scope of the changes and

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
> > On Fri, Mar 1, 2019, 7:48 AM Xingbo Jiang wrote: > >> I think we are aligned on the commitment, I'll start a vote thread for >> this shortly. >> >> Xiangrui Meng 于2019年2月27日周三 上午6:47写道: >> >>> In case there are issues visiting Google doc, I att

Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
Start with +1 from myself. Xingbo Jiang 于2019年3月1日周五 下午10:14写道: > Hi all, > > I want to call for a vote of SPARK-24615 > <https://issues.apache.org/jira/browse/SPARK-24615>. It improves Spark by > making it aware of GPUs exposed by cluster managers, and hence Spark can

[VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
Hi all, I want to call for a vote of SPARK-24615 . It improves Spark by making it aware of GPUs exposed by cluster managers, and hence Spark can match GPU resources with user task requests properly. The proposal

Re: SPIP: Accelerator-aware Scheduling

2019-03-01 Thread Xingbo Jiang
I think we are aligned on the commitment, I'll start a vote thread for this shortly. Xiangrui Meng 于2019年2月27日周三 上午6:47写道: > In case there are issues visiting Google doc, I attached PDF files to the > JIRA. > > On Tue, Feb 26, 2019 at 7:41 AM Xingbo Jiang > wrote: > >&

SPIP: Accelerator-aware Scheduling

2019-02-26 Thread Xingbo Jiang
Hi all, I want send a revised SPIP on implementing Accelerator(GPU)-aware Scheduling. It improves Spark by making it aware of GPUs exposed by cluster managers, and hence Spark can match GPU resources with user task requests properly. If you have scenarios that need to run workloads(DL/ML/Signal

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-13 Thread Xingbo Jiang
I'm working on the fix of SPARK-23243 and should be able push another commit in 1~2 days. More detailed discussions can go to the PR. Thanks for pushing this issue forward! I really appreciate efforts by submit PRs or involve in the discussions

Re: code freeze and branch cut for Apache Spark 2.4

2018-08-01 Thread Xingbo Jiang
Speaking of the code from hydrogen PRs, actually we didn't remove any of the existing logic, and I tried my best to hide almost all of the newly added logic behind a `isBarrier` tag (or something similar). I have to add some new variables and new methods to the core code paths, but I think they

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-25 Thread Xingbo Jiang
Xiangrui and I are leading an effort to implement a highly desirable feature, Barrier Execution Mode. https://issues.apache.org/jira/browse/SPARK-24374. This introduces a new scheduling model to Apache Spark so users can properly embed distributed DL training as a Spark stage to simplify the

[SPARK-24581] Design: BarrierTaskContext.barrier()

2018-07-24 Thread Xingbo Jiang
Hi All, This is a follow up work of [SPARK-24374 ] SPIP: Support Barrier Execution Mode in Apache Spark. https://docs.google.com/document/d/1r07-vU5JTH6s1jJ6azkmK0K5it6jwpfO6b_K3mJmxR4/edit?usp=sharing We need to provide a communication barrier

[DESIGN] Barrier Execution Mode

2018-07-08 Thread Xingbo Jiang
Hi All, I would like to invite you to review the design document for Barrier Execution Mode: https://docs.google.com/document/d/1GvcYR6ZFto3dOnjfLjZMtTezX0W5VYN9w1l4-tQXaZk/edit# TL;DR: We announced the project Hydrogen on recent Spark+AI Summit, a major part of the project involves significant

Re: Time for 2.3.2?

2018-06-28 Thread Xingbo Jiang
+1 Wenchen Fan 于2018年6月28日 周四下午2:06写道: > Hi Saisai, that's great! please go ahead! > > On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao > wrote: > >> +1, like mentioned by Marcelo, these issues seems quite severe. >> >> I can work on the release if short of hands :). >> >> Thanks >> Jerry >> >> >>

Re: [VOTE] [SPARK-24374] SPIP: Support Barrier Scheduling in Apache Spark

2018-06-01 Thread Xingbo Jiang
+1 2018-06-01 9:21 GMT-07:00 Xiangrui Meng : > Hi all, > > I want to call for a vote of SPARK-24374 > . It introduces a new > execution mode to Spark, which would help both integration with external > DL/AI frameworks and MLlib algorithm

Re: Clarify window behavior in Spark SQL

2018-04-03 Thread Xingbo Jiang
This is actually by design, without a `ORDER BY` clause, all rows are considered as the peer row of the current row, which means that the frame is effectively the entire partition. This behavior follows the window syntax of PGSQL. You can refer to the comment by yhuai:

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-01 Thread Xingbo Jiang
congs & welcome! 2018-04-02 13:28 GMT+08:00 Wenchen Fan : > Hi all, > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > Zhenhua is the major contributor of the CBO project, and has been > contributing across several areas of Spark for a while,

Re: Welcoming some new committers

2018-03-02 Thread Xingbo Jiang
Congratulations to everyone! 2018-03-03 8:51 GMT+08:00 Ilan Filonenko : > Congrats to everyone! :) > > On Fri, Mar 2, 2018 at 7:34 PM Felix Cheung > wrote: > >> Congrats and welcome! >> >> -- >> *From:* Dongjoon Hyun

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-22 Thread Xingbo Jiang
+1 2018-02-23 11:26 GMT+08:00 Takuya UESHIN : > +1 > > On Fri, Feb 23, 2018 at 12:24 PM, Wenchen Fan wrote: > >> +1 >> >> On Fri, Feb 23, 2018 at 6:23 AM, Sameer Agarwal >> wrote: >> >>> Please vote on releasing the following

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Xingbo Jiang
+1 Wenchen Fan 于2018年2月20日 周二下午1:09写道: > +1 > > On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin wrote: > >> +1 >> >> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal , >> wrote: >> >> this file shouldn't be included? >>>

Re: [VOTE] Spark 2.3.0 (RC2)

2018-02-04 Thread Xingbo Jiang
I filed another NPE problem in WebUI, I believe this is regression in 2.3: https://issues.apache.org/jira/browse/SPARK-23330 2018-02-01 10:38 GMT-08:00 Tom Graves : > I filed a jira [SPARK-23304] Spark SQL coalesce() against hive not > working - ASF JIRA