On Thu, 1 Dec 2022 at 08:10, Shixiong Zhu <zsxw...@gmail.com> wrote:

> +1
> This is exciting. I agree with Jerry that this SPIP and continuous
> processing are orthogonal. This SPIP itself would be a great improvement
> and impact most Structured Streaming users.
> Best Regards,
> Shixiong
> On Wed, Nov 30, 2022 at 6:57 AM Mridul Muralidharan <mri...@gmail.com>
> wrote:
>> Thanks for all the clarifications and details Jerry, Jungtaek :-)
>> This looks like an exciting improvement to Structured Streaming - looking
>> forward to it becoming part of Apache Spark !
>> Regards,
>> Mridul
>> On Mon, Nov 28, 2022 at 8:40 PM Jerry Peng <jerry.boyang.p...@gmail.com>
>> wrote:
>>> Hi all,
>>> I will add my two cents.  Improving the Microbatch execution engine does
>>> not prevent us from working/improving on the continuous execution engine in
>>> the future.  These are orthogonal issues.  This new mode I am proposing in
>>> the microbatch execution engine intends to lower latency of this execution
>>> engine that most people use today.  We can view it as an incremental
>>> improvement on the existing engine. I see the continuous execution engine
>>> as a partially completed re-write of spark streaming and may serve as the
>>> "future" engine powering Spark Streaming.   Improving the "current" engine
>>> does not mean we cannot work on a "future" engine.  These two are not
>>> mutually exclusive. I would like to focus the discussion on the merits of
>>> this feature in regards to the current micro-batch execution engine and not
>>> a discussion on the future of continuous execution engine.
>>> Best,
>>> Jerry
>>> On Wed, Nov 23, 2022 at 3:17 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>> Hi Mridul,
>>>> I'd like to make clear to avoid any misunderstanding - the decision was
>>>> not led by me. (I'm just a one of engineers in the team. Not even TL.) As
>>>> you see the direction, there was an internal consensus to not revisit the
>>>> continuous mode. There are various reasons, which I think we know already.
>>>> You seem to remember I have raised concerns about continuous mode, but have
>>>> you indicated that it was even over 2 years ago? I still see no traction
>>>> around the project. The main reason I abandoned the discussion was due to
>>>> promising effort on integrating push based shuffle into continuous mode to
>>>> achieve shuffle, but no effort has been made so far.
>>>> The goal of this SPIP is to have an alternative approach dealing with
>>>> same workload, given that we no longer have confidence of success of
>>>> continuous mode. But I also want to make clear that deprecating and
>>>> eventually retiring continuous mode is not a goal of this project. If that
>>>> happens eventually, that would be a side-effect. Someone may have concerns
>>>> that we have two different projects aiming for similar thing, but I'd
>>>> rather see both projects having competition. If anyone willing to improve
>>>> continuous mode can start making the effort right now. This SPIP does not
>>>> block it.
>>>> On Wed, Nov 23, 2022 at 5:29 PM Mridul Muralidharan <mri...@gmail.com>
>>>> wrote:
>>>>> Hi Jungtaek,
>>>>>   Given the goal of the SPIP is reducing latency for stateless apps,
>>>>> and should reasonably fit continuous mode design goals, it feels odd to 
>>>>> not
>>>>> support it fin the proposal.
>>>>> I know you have raised concerns about continuous mode in past as well
>>>>> in dev@ list, and we are further ignoring it in this proposal (and
>>>>> possibly other enhancements in past few releases).
>>>>> Do you want to revisit the discussion to support it and propose a vote
>>>>> on that ? And move it to deprecated ?
>>>>> I am much more comfortable not supporting this SPIP for CM if it was
>>>>> deprecated.
>>>>> Thoughts ?
>>>>> Regards,
>>>>> Mridul
>>>>> On Wed, Nov 23, 2022 at 1:16 AM Jerry Peng <
>>>>> jerry.boyang.p...@gmail.com> wrote:
>>>>>> Jungtaek,
>>>>>> Thanks for taking up the role to shepard this SPIP!  Thank you for
>>>>>> also chiming in on your thoughts concerning the continuous mode!
>>>>>> Best,
>>>>>> Jerry
>>>>>> On Tue, Nov 22, 2022 at 5:57 PM Jungtaek Lim <
>>>>>> kabhwan.opensou...@gmail.com> wrote:
>>>>>>> Just FYI, I'm shepherding this SPIP project.
>>>>>>> I think the major meta question would be, "why don't we spend
>>>>>>> effort on continuous mode rather than initiating another feature aiming 
>>>>>>> for
>>>>>>> the same workload?". Jerry already updated the doc to answer the 
>>>>>>> question,
>>>>>>> but I can also share my thoughts about it.
>>>>>>> I feel like the current "continuous mode" is a niche solution. (It's
>>>>>>> not to blame. If you have to deal with such workload but can't rewrite 
>>>>>>> the
>>>>>>> underlying engine from scratch, then there are really few options.)
>>>>>>> Since the implementation went with a workaround to implement which
>>>>>>> the architecture does not support natively e.g. distributed snapshot, it
>>>>>>> gets quite tricky on maintaining and expanding the project. It also
>>>>>>> requires 3rd parties to implement a separate source and sink
>>>>>>> implementation, which I'm not sure how many 3rd parties actually 
>>>>>>> followed
>>>>>>> so far.
>>>>>>> Eventually, "continuous mode" becomes an area no one in the active
>>>>>>> community knows the details and has willingness to maintain. I wouldn't 
>>>>>>> say
>>>>>>> we are confident to remove the tag on "experimental", although the 
>>>>>>> feature
>>>>>>> has been shipped for years. It was introduced in Spark 2.3, surprising
>>>>>>> enough?
>>>>>>> We went back and thought about the approach from scratch. Jerry came
>>>>>>> up with the idea which leverages existing microbatch execution, hence
>>>>>>> relatively stable and no need to require 3rd parties to support another
>>>>>>> mode. It adds complexity against microbatch execution but it's a lot 
>>>>>>> less
>>>>>>> complicated compared to the existing continuous mode. Definitely quite 
>>>>>>> less
>>>>>>> than creating a new record-to-record engine from scratch.
>>>>>>> That said, we want to propose and move forward with the new approach.
>>>>>>> ps. Eventually we could probably discuss retiring continuous mode if
>>>>>>> the new approach gets accepted and eventually considered as a stable one
>>>>>>> after several minor releases. That's just me.
>>>>>>> On Wed, Nov 23, 2022 at 5:16 AM Jerry Peng <
>>>>>>> jerry.boyang.p...@gmail.com> wrote:
>>>>>>>> Hi all,
>>>>>>>> I would like to start the discussion for a SPIP, Asynchronous
>>>>>>>> Offset Management in Structured Streaming.  The high level summary of 
>>>>>>>> the
>>>>>>>> SPIP is that currently in Structured Streaming we perform a couple of
>>>>>>>> offset management operations for progress tracking purposes 
>>>>>>>> synchronously
>>>>>>>> on the critical path which can contribute significantly to processing
>>>>>>>> latency.  If we were to make these operations asynchronous and less
>>>>>>>> frequent we can dramatically improve latency for certain types of
>>>>>>>> workloads.
>>>>>>>> I have put together a SPIP to implement such a mechanism.  Please
>>>>>>>> take a look!
>>>>>>>> SPIP Jira: https://issues.apache.org/jira/browse/SPARK-39591
>>>>>>>> SPIP doc:
>>>>>>>> https://docs.google.com/document/d/1iPiI4YoGCM0i61pBjkxcggU57gHKf2jVwD7HWMHgH-Y/edit?usp=sharing
>>>>>>>> Best,
>>>>>>>> Jerry

Reply via email to