Re: [DISCUSS] SPIP: Declarative Pipelines

Hyukjin Kwon Tue, 08 Apr 2025 18:08:20 -0700

+1

I am actually pretty excited to have this. Happy to see this being proposed.


On Wed, 9 Apr 2025 at 01:55, Chao Sun <[email protected]> wrote:

> +1. Super excited about this effort!
>
> On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <[email protected]> wrote:
>
>> +1 I support this SPIP because it simplifies data pipeline management and
>> enhances error detection.
>>
>>
>> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <[email protected]> wrote:
>>
>>> Excited to see this heading toward open source — materialized views and
>>> other features will bring a lot of value.
>>> +1 (non-binding)
>>>
>>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <[email protected]> wrote:
>>>
>>>> Hi Khalid – the CLI in the current proposal will need to be built on
>>>> top of internal APIs for constructing and launching pipeline executions.
>>>> We'll have the option to expose these in the future.
>>>>
>>>> It would be worthwhile to understand the use cases in more depth before
>>>> exposing these, because APIs are one-way doors and can be costly to
>>>> maintain.
>>>>
>>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <
>>>> [email protected]> wrote:
>>>>
>>>>> Looks great!
>>>>> QQ: will user able to run this pipeline from normal code? I.e. can I
>>>>> trigger a pipeline from *driver* code based on some condition etc. or
>>>>> it must be executed via separate shell command ?
>>>>> As a background Databricks imposes similar limitation where as you
>>>>> cannot run normal Spark code and DLT on the same cluster for some reason
>>>>> and forces to use two clusters increasing the cost and latency.
>>>>>
>>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <[email protected]> wrote:
>>>>>
>>>>>> Hi all – starting a discussion thread for a SPIP that I've been
>>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA
>>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc
>>>>>> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0>
>>>>>> ].
>>>>>>
>>>>>> The SPIP proposes extending Spark's lazy, declarative execution model
>>>>>> beyond single queries, to pipelines that keep multiple datasets up to 
>>>>>> date.
>>>>>> It introduces the ability to compose multiple transformations into a 
>>>>>> single
>>>>>> declarative dataflow graph.
>>>>>>
>>>>>> Declarative pipelines aim to simplify the development and management
>>>>>> of data pipelines, by  removing the need for manual orchestration of
>>>>>> dependencies and making it possible to catch many errors before any
>>>>>> execution steps are launched.
>>>>>>
>>>>>> Declarative pipelines can include both batch and streaming
>>>>>> computations, leveraging Structured Streaming for stream processing and 
>>>>>> new
>>>>>> materialized view syntax for batch processing. Tight integration with 
>>>>>> Spark
>>>>>> SQL's analyzer enables deeper analysis and earlier error detection than 
>>>>>> is
>>>>>> achievable with more generic frameworks.
>>>>>>
>>>>>> Let us know what you think!
>>>>>>
>>>>>>

Re: [DISCUSS] SPIP: Declarative Pipelines

Reply via email to