Re: [DISCUSS] SPIP: Declarative Pipelines

Sem Thu, 10 Apr 2025 17:13:41 -0700

+1 (non-binding)


On April 9, 2025 7:29:40 AM GMT+02:00, Rishab Joshi <[email protected]> 
wrote:
>+1 Exciting.
>Rishab Joshi
>
>On Tue, Apr 8, 2025, 10:04 PM Ruifeng Zheng <[email protected]> wrote:
>
>> +1
>>
>> On Wed, Apr 9, 2025 at 12:57 PM Denny Lee <[email protected]> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Tue, Apr 8, 2025 at 9:53 PM Yuming Wang <[email protected]> wrote:
>>>
>>>> +1
>>>>
>>>> On Wed, Apr 9, 2025 at 10:47 AM Jungtaek Lim <
>>>> [email protected]> wrote:
>>>>
>>>>> +1 looking forward to seeing this make progress!
>>>>>
>>>>> On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>>>>> > +1
>>>>>> >
>>>>>> > I am actually pretty excited to have this. Happy to see this being
>>>>>> proposed.
>>>>>> >
>>>>>> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
>>>>>> >
>>>>>> > > +1. Super excited about this effort!
>>>>>> > >
>>>>>> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected]>
>>>>>> wrote:
>>>>>> > >
>>>>>> > >> +1 I support this SPIP because it simplifies data pipeline
>>>>>> management and
>>>>>> > >> enhances error detection.
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]>
>>>>>> wrote:
>>>>>> > >>
>>>>>> > >>> Excited to see this heading toward open source — materialized
>>>>>> views and
>>>>>> > >>> other features will bring a lot of value.
>>>>>> > >>> +1 (non-binding)
>>>>>> > >>>
>>>>>> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]>
>>>>>> wrote:
>>>>>> > >>>
>>>>>> > >>>> Hi Khalid – the CLI in the current proposal will need to be
>>>>>> built on
>>>>>> > >>>> top of internal APIs for constructing and launching pipeline
>>>>>> executions.
>>>>>> > >>>> We'll have the option to expose these in the future.
>>>>>> > >>>>
>>>>>> > >>>> It would be worthwhile to understand the use cases in more
>>>>>> depth before
>>>>>> > >>>> exposing these, because APIs are one-way doors and can be
>>>>>> costly to
>>>>>> > >>>> maintain.
>>>>>> > >>>>
>>>>>> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>>>>> > >>>> [email protected]> wrote:
>>>>>> > >>>>
>>>>>> > >>>>> Looks great!
>>>>>> > >>>>> QQ: will user able to run this pipeline from normal code? I.e.
>>>>>> can I
>>>>>> > >>>>> trigger a pipeline from *driver* code based on some condition
>>>>>> etc. or
>>>>>> > >>>>> it must be executed via separate shell command ?
>>>>>> > >>>>> As a background Databricks imposes similar limitation where as
>>>>>> you
>>>>>> > >>>>> cannot run normal Spark code and DLT on the same cluster for
>>>>>> some reason
>>>>>> > >>>>> and forces to use two clusters increasing the cost and latency.
>>>>>> > >>>>>
>>>>>> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>>>>>> wrote:
>>>>>> > >>>>>
>>>>>> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've
>>>>>> been
>>>>>> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie
>>>>>> Yang: [JIRA
>>>>>> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>>>>> > >>>>>> <
>>>>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>>>>> >
>>>>>> > >>>>>> ].
>>>>>> > >>>>>>
>>>>>> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>>>>>> execution model
>>>>>> > >>>>>> beyond single queries, to pipelines that keep multiple
>>>>>> datasets up to date.
>>>>>> > >>>>>> It introduces the ability to compose multiple transformations
>>>>>> into a single
>>>>>> > >>>>>> declarative dataflow graph.
>>>>>> > >>>>>>
>>>>>> > >>>>>> Declarative pipelines aim to simplify the development and
>>>>>> management
>>>>>> > >>>>>> of data pipelines, by  removing the need for manual
>>>>>> orchestration of
>>>>>> > >>>>>> dependencies and making it possible to catch many errors
>>>>>> before any
>>>>>> > >>>>>> execution steps are launched.
>>>>>> > >>>>>>
>>>>>> > >>>>>> Declarative pipelines can include both batch and streaming
>>>>>> > >>>>>> computations, leveraging Structured Streaming for stream
>>>>>> processing and new
>>>>>> > >>>>>> materialized view syntax for batch processing. Tight
>>>>>> integration with Spark
>>>>>> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>>>>>> detection than is
>>>>>> > >>>>>> achievable with more generic frameworks.
>>>>>> > >>>>>>
>>>>>> > >>>>>> Let us know what you think!
>>>>>> > >>>>>>
>>>>>> > >>>>>>
>>>>>> >
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>
>>>>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to