Re: [DISCUSS] SPIP: Declarative Pipelines

Dilip Biswal Tue, 08 Apr 2025 09:34:16 -0700

Excited to see this heading toward open source — materialized views and
other features will bring a lot of value.
+1 (non-binding)


On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <sa...@apache.org> wrote:

> Hi Khalid – the CLI in the current proposal will need to be built on top
> of internal APIs for constructing and launching pipeline executions. We'll
> have the option to expose these in the future.
>
> It would be worthwhile to understand the use cases in more depth before
> exposing these, because APIs are one-way doors and can be costly to
> maintain.
>
> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <khalidmammad...@gmail.com>
> wrote:
>
>> Looks great!
>> QQ: will user able to run this pipeline from normal code? I.e. can I
>> trigger a pipeline from *driver* code based on some condition etc. or it
>> must be executed via separate shell command ?
>> As a background Databricks imposes similar limitation where as you cannot
>> run normal Spark code and DLT on the same cluster for some reason and
>> forces to use two clusters increasing the cost and latency.
>>
>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <sa...@apache.org> wrote:
>>
>>> Hi all – starting a discussion thread for a SPIP that I've been working
>>> on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA
>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0>
>>> ].
>>>
>>> The SPIP proposes extending Spark's lazy, declarative execution model
>>> beyond single queries, to pipelines that keep multiple datasets up to date.
>>> It introduces the ability to compose multiple transformations into a single
>>> declarative dataflow graph.
>>>
>>> Declarative pipelines aim to simplify the development and management of
>>> data pipelines, by  removing the need for manual orchestration of
>>> dependencies and making it possible to catch many errors before any
>>> execution steps are launched.
>>>
>>> Declarative pipelines can include both batch and streaming computations,
>>> leveraging Structured Streaming for stream processing and new materialized
>>> view syntax for batch processing. Tight integration with Spark SQL's
>>> analyzer enables deeper analysis and earlier error detection than is
>>> achievable with more generic frameworks.
>>>
>>> Let us know what you think!
>>>
>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to