Re: [DISCUSS] SPIP: Declarative Pipelines

Jungtaek Lim Thu, 10 Apr 2025 13:23:42 -0700

+1 looking forward to seeing this make progress!

On Wed, Apr 9, 2025 at 11:32 AM Yang Jie <[email protected]> wrote:


> +1
>
> On 2025/04/09 01:07:57 Hyukjin Kwon wrote:
> > +1
> >
> > I am actually pretty excited to have this. Happy to see this being
> proposed.
> >
> > On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:
> >
> > > +1. Super excited about this effort!
> > >
> > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected]>
> wrote:
> > >
> > >> +1 I support this SPIP because it simplifies data pipeline management
> and
> > >> enhances error detection.
> > >>
> > >>
> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]>
> wrote:
> > >>
> > >>> Excited to see this heading toward open source — materialized views
> and
> > >>> other features will bring a lot of value.
> > >>> +1 (non-binding)
> > >>>
> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]> wrote:
> > >>>
> > >>>> Hi Khalid – the CLI in the current proposal will need to be built on
> > >>>> top of internal APIs for constructing and launching pipeline
> executions.
> > >>>> We'll have the option to expose these in the future.
> > >>>>
> > >>>> It would be worthwhile to understand the use cases in more depth
> before
> > >>>> exposing these, because APIs are one-way doors and can be costly to
> > >>>> maintain.
> > >>>>
> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
> > >>>> [email protected]> wrote:
> > >>>>
> > >>>>> Looks great!
> > >>>>> QQ: will user able to run this pipeline from normal code? I.e. can
> I
> > >>>>> trigger a pipeline from *driver* code based on some condition etc.
> or
> > >>>>> it must be executed via separate shell command ?
> > >>>>> As a background Databricks imposes similar limitation where as you
> > >>>>> cannot run normal Spark code and DLT on the same cluster for some
> reason
> > >>>>> and forces to use two clusters increasing the cost and latency.
> > >>>>>
> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]> wrote:
> > >>>>>
> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've been
> > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang:
> [JIRA
> > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
> > >>>>>> <
> https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0
> >
> > >>>>>> ].
> > >>>>>>
> > >>>>>> The SPIP proposes extending Spark's lazy, declarative execution
> model
> > >>>>>> beyond single queries, to pipelines that keep multiple datasets
> up to date.
> > >>>>>> It introduces the ability to compose multiple transformations
> into a single
> > >>>>>> declarative dataflow graph.
> > >>>>>>
> > >>>>>> Declarative pipelines aim to simplify the development and
> management
> > >>>>>> of data pipelines, by  removing the need for manual orchestration
> of
> > >>>>>> dependencies and making it possible to catch many errors before
> any
> > >>>>>> execution steps are launched.
> > >>>>>>
> > >>>>>> Declarative pipelines can include both batch and streaming
> > >>>>>> computations, leveraging Structured Streaming for stream
> processing and new
> > >>>>>> materialized view syntax for batch processing. Tight integration
> with Spark
> > >>>>>> SQL's analyzer enables deeper analysis and earlier error
> detection than is
> > >>>>>> achievable with more generic frameworks.
> > >>>>>>
> > >>>>>> Let us know what you think!
> > >>>>>>
> > >>>>>>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to