Excited to see this heading toward open source — materialized views and other features will bring a lot of value. +1 (non-binding)
On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <sa...@apache.org> wrote: > Hi Khalid – the CLI in the current proposal will need to be built on top > of internal APIs for constructing and launching pipeline executions. We'll > have the option to expose these in the future. > > It would be worthwhile to understand the use cases in more depth before > exposing these, because APIs are one-way doors and can be costly to > maintain. > > On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov <khalidmammad...@gmail.com> > wrote: > >> Looks great! >> QQ: will user able to run this pipeline from normal code? I.e. can I >> trigger a pipeline from *driver* code based on some condition etc. or it >> must be executed via separate shell command ? >> As a background Databricks imposes similar limitation where as you cannot >> run normal Spark code and DLT on the same cluster for some reason and >> forces to use two clusters increasing the cost and latency. >> >> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <sa...@apache.org> wrote: >> >>> Hi all – starting a discussion thread for a SPIP that I've been working >>> on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA >>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc >>> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0> >>> ]. >>> >>> The SPIP proposes extending Spark's lazy, declarative execution model >>> beyond single queries, to pipelines that keep multiple datasets up to date. >>> It introduces the ability to compose multiple transformations into a single >>> declarative dataflow graph. >>> >>> Declarative pipelines aim to simplify the development and management of >>> data pipelines, by removing the need for manual orchestration of >>> dependencies and making it possible to catch many errors before any >>> execution steps are launched. >>> >>> Declarative pipelines can include both batch and streaming computations, >>> leveraging Structured Streaming for stream processing and new materialized >>> view syntax for batch processing. Tight integration with Spark SQL's >>> analyzer enables deeper analysis and earlier error detection than is >>> achievable with more generic frameworks. >>> >>> Let us know what you think! >>> >>>