Re: [DISCUSS] SPIP: Declarative Pipelines

Walaa Eldin Moustafa Thu, 10 Apr 2025 21:44:05 -0700

This sounds quite interesting.

+1 to What Szheon said about excitement around MVs. Happy to collaborate.


On Wed, Apr 9, 2025 at 5:29 PM Ángel Álvarez Pascua <
[email protected]> wrote:

> +1 (non-binding)
>
> El jue, 10 abr 2025, 1:50, Burak Yavuz <[email protected]> escribió:
>
>> +1
>>
>> On Wed, Apr 9, 2025 at 4:33 PM Szehon Ho <[email protected]> wrote:
>>
>>> +1 really excited to finally see Materialized View finally make its way
>>> to Spark, as many other ecosystem projects (Trino, Starrocks, soon Iceberg)
>>> already supporting it.
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Wed, Apr 9, 2025 at 2:33 AM Martin Grund
>>> <[email protected]> wrote:
>>>
>>>> +1
>>>>
>>>> On Wed, Apr 9, 2025 at 9:37 AM Mich Talebzadeh <
>>>> [email protected]> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Dr Mich Talebzadeh,
>>>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, 9 Apr 2025 at 08:07, Peter Toth <[email protected]> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> On Wed, Apr 9, 2025 at 8:51 AM Cheng Pan <[email protected]> wrote:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Glad to see Spark SQL extended to streaming use cases.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Cheng Pan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Apr 9, 2025, at 14:43, Anton Okolnychyi <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> вт, 8 квіт. 2025 р. о 23:36 Jacky Lee <[email protected]> пише:
>>>>>>>
>>>>>>>> +1 I'm delighted that it will be open-sourced, enabling greater
>>>>>>>> integration with Iceberg/Delta to unlock more value.
>>>>>>>>
>>>>>>>> Jungtaek Lim <[email protected]> 于2025年4月9日周三 10:47写道：
>>>>>>>> >
>>>>>>>> > +1 looking forward to seeing this make progress!
>>>>>>>> >
>>>>>>>> > On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> +1
>>>>>>>> >>
>>>>>>>> >> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
>>>>>>>> >> > +1
>>>>>>>> >> >
>>>>>>>> >> > I am actually pretty excited to have this. Happy to see this
>>>>>>>> being proposed.
>>>>>>>> >> >
>>>>>>>> >> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]>
>>>>>>>> wrote:
>>>>>>>> >> >
>>>>>>>> >> > > +1. Super excited about this effort!
>>>>>>>> >> > >
>>>>>>>> >> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >> > >
>>>>>>>> >> > >> +1 I support this SPIP because it simplifies data pipeline
>>>>>>>> management and
>>>>>>>> >> > >> enhances error detection.
>>>>>>>> >> > >>
>>>>>>>> >> > >>
>>>>>>>> >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >> > >>
>>>>>>>> >> > >>> Excited to see this heading toward open source —
>>>>>>>> materialized views and
>>>>>>>> >> > >>> other features will bring a lot of value.
>>>>>>>> >> > >>> +1 (non-binding)
>>>>>>>> >> > >>>
>>>>>>>> >> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >> > >>>
>>>>>>>> >> > >>>> Hi Khalid – the CLI in the current proposal will need to
>>>>>>>> be built on
>>>>>>>> >> > >>>> top of internal APIs for constructing and launching
>>>>>>>> pipeline executions.
>>>>>>>> >> > >>>> We'll have the option to expose these in the future.
>>>>>>>> >> > >>>>
>>>>>>>> >> > >>>> It would be worthwhile to understand the use cases in
>>>>>>>> more depth before
>>>>>>>> >> > >>>> exposing these, because APIs are one-way doors and can be
>>>>>>>> costly to
>>>>>>>> >> > >>>> maintain.
>>>>>>>> >> > >>>>
>>>>>>>> >> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>>>>>>> >> > >>>> [email protected]> wrote:
>>>>>>>> >> > >>>>
>>>>>>>> >> > >>>>> Looks great!
>>>>>>>> >> > >>>>> QQ: will user able to run this pipeline from normal
>>>>>>>> code? I.e. can I
>>>>>>>> >> > >>>>> trigger a pipeline from *driver* code based on some
>>>>>>>> condition etc. or
>>>>>>>> >> > >>>>> it must be executed via separate shell command ?
>>>>>>>> >> > >>>>> As a background Databricks imposes similar limitation
>>>>>>>> where as you
>>>>>>>> >> > >>>>> cannot run normal Spark code and DLT on the same cluster
>>>>>>>> for some reason
>>>>>>>> >> > >>>>> and forces to use two clusters increasing the cost and
>>>>>>>> latency.
>>>>>>>> >> > >>>>>
>>>>>>>> >> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <
>>>>>>>> [email protected]> wrote:
>>>>>>>> >> > >>>>>
>>>>>>>> >> > >>>>>> Hi all – starting a discussion thread for a SPIP that
>>>>>>>> I've been
>>>>>>>> >> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and
>>>>>>>> Jie Yang: [JIRA
>>>>>>>> >> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>]
>>>>>>>> [Doc
>>>>>>>> >> > >>>>>> <
>>>>>>>> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
>>>>>>>> >
>>>>>>>> >> > >>>>>> ].
>>>>>>>> >> > >>>>>>
>>>>>>>> >> > >>>>>> The SPIP proposes extending Spark's lazy, declarative
>>>>>>>> execution model
>>>>>>>> >> > >>>>>> beyond single queries, to pipelines that keep multiple
>>>>>>>> datasets up to date.
>>>>>>>> >> > >>>>>> It introduces the ability to compose multiple
>>>>>>>> transformations into a single
>>>>>>>> >> > >>>>>> declarative dataflow graph.
>>>>>>>> >> > >>>>>>
>>>>>>>> >> > >>>>>> Declarative pipelines aim to simplify the development
>>>>>>>> and management
>>>>>>>> >> > >>>>>> of data pipelines, by  removing the need for manual
>>>>>>>> orchestration of
>>>>>>>> >> > >>>>>> dependencies and making it possible to catch many
>>>>>>>> errors before any
>>>>>>>> >> > >>>>>> execution steps are launched.
>>>>>>>> >> > >>>>>>
>>>>>>>> >> > >>>>>> Declarative pipelines can include both batch and
>>>>>>>> streaming
>>>>>>>> >> > >>>>>> computations, leveraging Structured Streaming for
>>>>>>>> stream processing and new
>>>>>>>> >> > >>>>>> materialized view syntax for batch processing. Tight
>>>>>>>> integration with Spark
>>>>>>>> >> > >>>>>> SQL's analyzer enables deeper analysis and earlier
>>>>>>>> error detection than is
>>>>>>>> >> > >>>>>> achievable with more generic frameworks.
>>>>>>>> >> > >>>>>>
>>>>>>>> >> > >>>>>> Let us know what you think!
>>>>>>>> >> > >>>>>>
>>>>>>>> >> > >>>>>>
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >> To unsubscribe e-mail: [email protected]
>>>>>>>> >>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to