Re: [DISCUSS] SPIP: Declarative Pipelines

Yang Jie Tue, 08 Apr 2025 19:32:05 -0700

+1

On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
> +1
> 
> I am actually pretty excited to have this. Happy to see this being proposed.
> 
> On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
> 
> > +1. Super excited about this effort!
> >
> > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected]> wrote:
> >
> >> +1 I support this SPIP because it simplifies data pipeline management and
> >> enhances error detection.
> >>
> >>
> >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]> wrote:
> >>
> >>> Excited to see this heading toward open source — materialized views and
> >>> other features will bring a lot of value.
> >>> +1 (non-binding)
> >>>
> >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]> wrote:
> >>>
> >>>> Hi Khalid – the CLI in the current proposal will need to be built on
> >>>> top of internal APIs for constructing and launching pipeline executions.
> >>>> We'll have the option to expose these in the future.
> >>>>
> >>>> It would be worthwhile to understand the use cases in more depth before
> >>>> exposing these, because APIs are one-way doors and can be costly to
> >>>> maintain.
> >>>>
> >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
> >>>> [email protected]> wrote:
> >>>>
> >>>>> Looks great!
> >>>>> QQ: will user able to run this pipeline from normal code? I.e. can I
> >>>>> trigger a pipeline from *driver* code based on some condition etc. or
> >>>>> it must be executed via separate shell command ?
> >>>>> As a background Databricks imposes similar limitation where as you
> >>>>> cannot run normal Spark code and DLT on the same cluster for some reason
> >>>>> and forces to use two clusters increasing the cost and latency.
> >>>>>
> >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]> wrote:
> >>>>>
> >>>>>> Hi all – starting a discussion thread for a SPIP that I've been
> >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA
> >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
> >>>>>> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0>
> >>>>>> ].
> >>>>>>
> >>>>>> The SPIP proposes extending Spark's lazy, declarative execution model
> >>>>>> beyond single queries, to pipelines that keep multiple datasets up to 
> >>>>>> date.
> >>>>>> It introduces the ability to compose multiple transformations into a 
> >>>>>> single
> >>>>>> declarative dataflow graph.
> >>>>>>
> >>>>>> Declarative pipelines aim to simplify the development and management
> >>>>>> of data pipelines, by  removing the need for manual orchestration of
> >>>>>> dependencies and making it possible to catch many errors before any
> >>>>>> execution steps are launched.
> >>>>>>
> >>>>>> Declarative pipelines can include both batch and streaming
> >>>>>> computations, leveraging Structured Streaming for stream processing 
> >>>>>> and new
> >>>>>> materialized view syntax for batch processing. Tight integration with 
> >>>>>> Spark
> >>>>>> SQL's analyzer enables deeper analysis and earlier error detection 
> >>>>>> than is
> >>>>>> achievable with more generic frameworks.
> >>>>>>
> >>>>>> Let us know what you think!
> >>>>>>
> >>>>>>
>


---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to