+1 On Thu, 1 Dec 2022 at 08:10, Shixiong Zhu <zsxw...@gmail.com> wrote:
> +1 > > This is exciting. I agree with Jerry that this SPIP and continuous > processing are orthogonal. This SPIP itself would be a great improvement > and impact most Structured Streaming users. > > Best Regards, > Shixiong > > > On Wed, Nov 30, 2022 at 6:57 AM Mridul Muralidharan <mri...@gmail.com> > wrote: > >> >> Thanks for all the clarifications and details Jerry, Jungtaek :-) >> This looks like an exciting improvement to Structured Streaming - looking >> forward to it becoming part of Apache Spark ! >> >> Regards, >> Mridul >> >> >> On Mon, Nov 28, 2022 at 8:40 PM Jerry Peng <jerry.boyang.p...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> I will add my two cents. Improving the Microbatch execution engine does >>> not prevent us from working/improving on the continuous execution engine in >>> the future. These are orthogonal issues. This new mode I am proposing in >>> the microbatch execution engine intends to lower latency of this execution >>> engine that most people use today. We can view it as an incremental >>> improvement on the existing engine. I see the continuous execution engine >>> as a partially completed re-write of spark streaming and may serve as the >>> "future" engine powering Spark Streaming. Improving the "current" engine >>> does not mean we cannot work on a "future" engine. These two are not >>> mutually exclusive. I would like to focus the discussion on the merits of >>> this feature in regards to the current micro-batch execution engine and not >>> a discussion on the future of continuous execution engine. >>> >>> Best, >>> >>> Jerry >>> >>> >>> On Wed, Nov 23, 2022 at 3:17 AM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> >>>> Hi Mridul, >>>> >>>> I'd like to make clear to avoid any misunderstanding - the decision was >>>> not led by me. (I'm just a one of engineers in the team. Not even TL.) As >>>> you see the direction, there was an internal consensus to not revisit the >>>> continuous mode. There are various reasons, which I think we know already. >>>> You seem to remember I have raised concerns about continuous mode, but have >>>> you indicated that it was even over 2 years ago? I still see no traction >>>> around the project. The main reason I abandoned the discussion was due to >>>> promising effort on integrating push based shuffle into continuous mode to >>>> achieve shuffle, but no effort has been made so far. >>>> >>>> The goal of this SPIP is to have an alternative approach dealing with >>>> same workload, given that we no longer have confidence of success of >>>> continuous mode. But I also want to make clear that deprecating and >>>> eventually retiring continuous mode is not a goal of this project. If that >>>> happens eventually, that would be a side-effect. Someone may have concerns >>>> that we have two different projects aiming for similar thing, but I'd >>>> rather see both projects having competition. If anyone willing to improve >>>> continuous mode can start making the effort right now. This SPIP does not >>>> block it. >>>> >>>> >>>> On Wed, Nov 23, 2022 at 5:29 PM Mridul Muralidharan <mri...@gmail.com> >>>> wrote: >>>> >>>>> >>>>> Hi Jungtaek, >>>>> >>>>> Given the goal of the SPIP is reducing latency for stateless apps, >>>>> and should reasonably fit continuous mode design goals, it feels odd to >>>>> not >>>>> support it fin the proposal. >>>>> >>>>> I know you have raised concerns about continuous mode in past as well >>>>> in dev@ list, and we are further ignoring it in this proposal (and >>>>> possibly other enhancements in past few releases). >>>>> >>>>> Do you want to revisit the discussion to support it and propose a vote >>>>> on that ? And move it to deprecated ? >>>>> >>>>> I am much more comfortable not supporting this SPIP for CM if it was >>>>> deprecated. >>>>> >>>>> Thoughts ? >>>>> >>>>> Regards, >>>>> Mridul >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng < >>>>> jerry.boyang.p...@gmail.com> wrote: >>>>> >>>>>> Jungtaek, >>>>>> >>>>>> Thanks for taking up the role to shepard this SPIP! Thank you for >>>>>> also chiming in on your thoughts concerning the continuous mode! >>>>>> >>>>>> Best, >>>>>> >>>>>> Jerry >>>>>> >>>>>> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim < >>>>>> kabhwan.opensou...@gmail.com> wrote: >>>>>> >>>>>>> Just FYI, I'm shepherding this SPIP project. >>>>>>> >>>>>>> I think the major meta question would be, "why don't we spend >>>>>>> effort on continuous mode rather than initiating another feature aiming >>>>>>> for >>>>>>> the same workload?". Jerry already updated the doc to answer the >>>>>>> question, >>>>>>> but I can also share my thoughts about it. >>>>>>> >>>>>>> I feel like the current "continuous mode" is a niche solution. (It's >>>>>>> not to blame. If you have to deal with such workload but can't rewrite >>>>>>> the >>>>>>> underlying engine from scratch, then there are really few options.) >>>>>>> Since the implementation went with a workaround to implement which >>>>>>> the architecture does not support natively e.g. distributed snapshot, it >>>>>>> gets quite tricky on maintaining and expanding the project. It also >>>>>>> requires 3rd parties to implement a separate source and sink >>>>>>> implementation, which I'm not sure how many 3rd parties actually >>>>>>> followed >>>>>>> so far. >>>>>>> >>>>>>> Eventually, "continuous mode" becomes an area no one in the active >>>>>>> community knows the details and has willingness to maintain. I wouldn't >>>>>>> say >>>>>>> we are confident to remove the tag on "experimental", although the >>>>>>> feature >>>>>>> has been shipped for years. It was introduced in Spark 2.3, surprising >>>>>>> enough? >>>>>>> >>>>>>> We went back and thought about the approach from scratch. Jerry came >>>>>>> up with the idea which leverages existing microbatch execution, hence >>>>>>> relatively stable and no need to require 3rd parties to support another >>>>>>> mode. It adds complexity against microbatch execution but it's a lot >>>>>>> less >>>>>>> complicated compared to the existing continuous mode. Definitely quite >>>>>>> less >>>>>>> than creating a new record-to-record engine from scratch. >>>>>>> >>>>>>> That said, we want to propose and move forward with the new approach. >>>>>>> >>>>>>> ps. Eventually we could probably discuss retiring continuous mode if >>>>>>> the new approach gets accepted and eventually considered as a stable one >>>>>>> after several minor releases. That's just me. >>>>>>> >>>>>>> On Wed, Nov 23, 2022 at 5:16 AM Jerry Peng < >>>>>>> jerry.boyang.p...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I would like to start the discussion for a SPIP, Asynchronous >>>>>>>> Offset Management in Structured Streaming. The high level summary of >>>>>>>> the >>>>>>>> SPIP is that currently in Structured Streaming we perform a couple of >>>>>>>> offset management operations for progress tracking purposes >>>>>>>> synchronously >>>>>>>> on the critical path which can contribute significantly to processing >>>>>>>> latency. If we were to make these operations asynchronous and less >>>>>>>> frequent we can dramatically improve latency for certain types of >>>>>>>> workloads. >>>>>>>> >>>>>>>> I have put together a SPIP to implement such a mechanism. Please >>>>>>>> take a look! >>>>>>>> >>>>>>>> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-39591 >>>>>>>> >>>>>>>> SPIP doc: >>>>>>>> https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Jerry >>>>>>>> >>>>>>>