+1. Super excited about this effort! On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <huaxin.ga...@gmail.com> wrote:
> +1 I support this SPIP because it simplifies data pipeline management and > enhances error detection. > > > On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <dkbis...@gmail.com> wrote: > >> Excited to see this heading toward open source — materialized views and >> other features will bring a lot of value. >> +1 (non-binding) >> >> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <sa...@apache.org> wrote: >> >>> Hi Khalid – the CLI in the current proposal will need to be built on top >>> of internal APIs for constructing and launching pipeline executions. We'll >>> have the option to expose these in the future. >>> >>> It would be worthwhile to understand the use cases in more depth before >>> exposing these, because APIs are one-way doors and can be costly to >>> maintain. >>> >>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov < >>> khalidmammad...@gmail.com> wrote: >>> >>>> Looks great! >>>> QQ: will user able to run this pipeline from normal code? I.e. can I >>>> trigger a pipeline from *driver* code based on some condition etc. or >>>> it must be executed via separate shell command ? >>>> As a background Databricks imposes similar limitation where as you >>>> cannot run normal Spark code and DLT on the same cluster for some reason >>>> and forces to use two clusters increasing the cost and latency. >>>> >>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <sa...@apache.org> wrote: >>>> >>>>> Hi all – starting a discussion thread for a SPIP that I've been >>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA >>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc >>>>> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0> >>>>> ]. >>>>> >>>>> The SPIP proposes extending Spark's lazy, declarative execution model >>>>> beyond single queries, to pipelines that keep multiple datasets up to >>>>> date. >>>>> It introduces the ability to compose multiple transformations into a >>>>> single >>>>> declarative dataflow graph. >>>>> >>>>> Declarative pipelines aim to simplify the development and management >>>>> of data pipelines, by removing the need for manual orchestration of >>>>> dependencies and making it possible to catch many errors before any >>>>> execution steps are launched. >>>>> >>>>> Declarative pipelines can include both batch and streaming >>>>> computations, leveraging Structured Streaming for stream processing and >>>>> new >>>>> materialized view syntax for batch processing. Tight integration with >>>>> Spark >>>>> SQL's analyzer enables deeper analysis and earlier error detection than is >>>>> achievable with more generic frameworks. >>>>> >>>>> Let us know what you think! >>>>> >>>>>