Re: [DISCUSS] SPIP: Declarative Pipelines

Burak Yavuz Wed, 09 Apr 2025 21:39:31 -0700

+1

On Wed, Apr 9, 2025 at 4:33 PM Szehon Ho <[email protected]> wrote:


> +1 really excited to finally see Materialized View finally make its way to
> Spark, as many other ecosystem projects (Trino, Starrocks, soon Iceberg)
> already supporting it.
>
> Thanks
> Szehon
>
> On Wed, Apr 9, 2025 at 2:33 AM Martin Grund <[email protected]>
> wrote:
>
>> +1
>>
>> On Wed, Apr 9, 2025 at 9:37 AM Mich Talebzadeh <[email protected]>
>> wrote:
>>
>>> +1
>>>
>>> Dr Mich Talebzadeh,
>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, 9 Apr 2025 at 08:07, Peter Toth <[email protected]> wrote:
>>>
>>>> +1
>>>>
>>>> On Wed, Apr 9, 2025 at 8:51 AM Cheng Pan <[email protected]> wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> Glad to see Spark SQL extended to streaming use cases.
>>>>>
>>>>> Thanks,
>>>>> Cheng Pan
>>>>>
>>>>>
>>>>>
>>>>> On Apr 9, 2025, at 14:43, Anton Okolnychyi <[email protected]>
>>>>> wrote:
>>>>>
>>>>> +1
>>>>>
>>>>> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee <[email protected]> пише:
>>>>>
>>>>>> +1 I'm delighted that it will be open-sourced, enabling greater
>>>>>> integration with Iceberg/Delta to unlock more value.
>>>>>>
>>>>>> Jungtaek Lim <[email protected]> 于2025年4月9日周三 10:47写道：
>>>>>> >
>>>>>> > +1 looking forward to seeing this make progress!
>>>>>> >
>>>>>> > On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> +1
>>>>>> >>
>>>>>> >> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>>>>> >> > +1
>>>>>> >> >
>>>>>> >> > I am actually pretty excited to have this. Happy to see this
>>>>>> being proposed.
>>>>>> >> >
>>>>>> >> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]>
>>>>>> wrote:
>>>>>> >> >
>>>>>> >> > > +1. Super excited about this effort!
>>>>>> >> > >
>>>>>> >> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <
>>>>>> [email protected]> wrote:
>>>>>> >> > >
>>>>>> >> > >> +1 I support this SPIP because it simplifies data pipeline
>>>>>> management and
>>>>>> >> > >> enhances error detection.
>>>>>> >> > >>
>>>>>> >> > >>
>>>>>> >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <
>>>>>> [email protected]> wrote:
>>>>>> >> > >>
>>>>>> >> > >>> Excited to see this heading toward open source —
>>>>>> materialized views and
>>>>>> >> > >>> other features will bring a lot of value.
>>>>>> >> > >>> +1 (non-binding)
>>>>>> >> > >>>
>>>>>> >> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]>
>>>>>> wrote:
>>>>>> >> > >>>
>>>>>> >> > >>>> Hi Khalid – the CLI in the current proposal will need to be
>>>>>> built on
>>>>>> >> > >>>> top of internal APIs for constructing and launching
>>>>>> pipeline executions.
>>>>>> >> > >>>> We'll have the option to expose these in the future.
>>>>>> >> > >>>>
>>>>>> >> > >>>> It would be worthwhile to understand the use cases in more
>>>>>> depth before
>>>>>> >> > >>>> exposing these, because APIs are one-way doors and can be
>>>>>> costly to
>>>>>> >> > >>>> maintain.
>>>>>> >> > >>>>
>>>>>> >> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>>>>> >> > >>>> [email protected]> wrote:
>>>>>> >> > >>>>
>>>>>> >> > >>>>> Looks great!
>>>>>> >> > >>>>> QQ: will user able to run this pipeline from normal code?
>>>>>> I.e. can I
>>>>>> >> > >>>>> trigger a pipeline from *driver* code based on some
>>>>>> condition etc. or
>>>>>> >> > >>>>> it must be executed via separate shell command ?
>>>>>> >> > >>>>> As a background Databricks imposes similar limitation
>>>>>> where as you
>>>>>> >> > >>>>> cannot run normal Spark code and DLT on the same cluster
>>>>>> for some reason
>>>>>> >> > >>>>> and forces to use two clusters increasing the cost and
>>>>>> latency.
>>>>>> >> > >>>>>
>>>>>> >> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]>
>>>>>> wrote:
>>>>>> >> > >>>>>
>>>>>> >> > >>>>>> Hi all – starting a discussion thread for a SPIP that
>>>>>> I've been
>>>>>> >> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie
>>>>>> Yang: [JIRA
>>>>>> >> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>>>>> >> > >>>>>> <
>>>>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>>>>> >
>>>>>> >> > >>>>>> ].
>>>>>> >> > >>>>>>
>>>>>> >> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>>>>>> execution model
>>>>>> >> > >>>>>> beyond single queries, to pipelines that keep multiple
>>>>>> datasets up to date.
>>>>>> >> > >>>>>> It introduces the ability to compose multiple
>>>>>> transformations into a single
>>>>>> >> > >>>>>> declarative dataflow graph.
>>>>>> >> > >>>>>>
>>>>>> >> > >>>>>> Declarative pipelines aim to simplify the development and
>>>>>> management
>>>>>> >> > >>>>>> of data pipelines, by  removing the need for manual
>>>>>> orchestration of
>>>>>> >> > >>>>>> dependencies and making it possible to catch many errors
>>>>>> before any
>>>>>> >> > >>>>>> execution steps are launched.
>>>>>> >> > >>>>>>
>>>>>> >> > >>>>>> Declarative pipelines can include both batch and streaming
>>>>>> >> > >>>>>> computations, leveraging Structured Streaming for stream
>>>>>> processing and new
>>>>>> >> > >>>>>> materialized view syntax for batch processing. Tight
>>>>>> integration with Spark
>>>>>> >> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
>>>>>> detection than is
>>>>>> >> > >>>>>> achievable with more generic frameworks.
>>>>>> >> > >>>>>>
>>>>>> >> > >>>>>> Let us know what you think!
>>>>>> >> > >>>>>>
>>>>>> >> > >>>>>>
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> ---------------------------------------------------------------------
>>>>>> >> To unsubscribe e-mail: [email protected]
>>>>>> >>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>
>>>>>>
>>>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to