Re: [DISCUSS] SPIP: Declarative Pipelines

Khalid Mammadov Sat, 05 Apr 2025 23:59:15 -0700

Looks great!
QQ: will user able to run this pipeline from normal code? I.e. can I
trigger a pipeline from *driver* code based on some condition etc. or it
must be executed via separate shell command ?
As a background Databricks imposes similar limitation where as you cannot
run normal Spark code and DLT on the same cluster for some reason and
forces to use two clusters increasing the cost and latency.


On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]> wrote:

> Hi all – starting a discussion thread for a SPIP that I've been working on
> with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA
> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0>
> ].
>
> The SPIP proposes extending Spark's lazy, declarative execution model
> beyond single queries, to pipelines that keep multiple datasets up to date.
> It introduces the ability to compose multiple transformations into a single
> declarative dataflow graph.
>
> Declarative pipelines aim to simplify the development and management of
> data pipelines, by  removing the need for manual orchestration of
> dependencies and making it possible to catch many errors before any
> execution steps are launched.
>
> Declarative pipelines can include both batch and streaming computations,
> leveraging Structured Streaming for stream processing and new materialized
> view syntax for batch processing. Tight integration with Spark SQL's
> analyzer enables deeper analysis and earlier error detection than is
> achievable with more generic frameworks.
>
> Let us know what you think!
>
>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to