+1 On 2025/04/09 01:07:57 Hyukjin Kwon wrote: > +1 > > I am actually pretty excited to have this. Happy to see this being proposed. > > On Wed, 9 Apr 2025 at 01:55, Chao Sun <sunc...@apache.org> wrote: > > > +1. Super excited about this effort! > > > > On Tue, Apr 8, 2025 at 9:47 AM huaxin gao <huaxin.ga...@gmail.com> wrote: > > > >> +1 I support this SPIP because it simplifies data pipeline management and > >> enhances error detection. > >> > >> > >> On Tue, Apr 8, 2025 at 9:33 AM Dilip Biswal <dkbis...@gmail.com> wrote: > >> > >>> Excited to see this heading toward open source — materialized views and > >>> other features will bring a lot of value. > >>> +1 (non-binding) > >>> > >>> On Mon, Apr 7, 2025 at 10:37 AM Sandy Ryza <sa...@apache.org> wrote: > >>> > >>>> Hi Khalid – the CLI in the current proposal will need to be built on > >>>> top of internal APIs for constructing and launching pipeline executions. > >>>> We'll have the option to expose these in the future. > >>>> > >>>> It would be worthwhile to understand the use cases in more depth before > >>>> exposing these, because APIs are one-way doors and can be costly to > >>>> maintain. > >>>> > >>>> On Sat, Apr 5, 2025 at 11:59 PM Khalid Mammadov < > >>>> khalidmammad...@gmail.com> wrote: > >>>> > >>>>> Looks great! > >>>>> QQ: will user able to run this pipeline from normal code? I.e. can I > >>>>> trigger a pipeline from *driver* code based on some condition etc. or > >>>>> it must be executed via separate shell command ? > >>>>> As a background Databricks imposes similar limitation where as you > >>>>> cannot run normal Spark code and DLT on the same cluster for some reason > >>>>> and forces to use two clusters increasing the cost and latency. > >>>>> > >>>>> On Sat, 5 Apr 2025 at 23:03, Sandy Ryza <sa...@apache.org> wrote: > >>>>> > >>>>>> Hi all – starting a discussion thread for a SPIP that I've been > >>>>>> working on with Chao Sun, Kent Yao, Yuming Wang, and Jie Yang: [JIRA > >>>>>> <https://issues.apache.org/jira/browse/SPARK-51727>] [Doc > >>>>>> <https://docs.google.com/document/d/1PsSTngFuRVEOvUGzp_25CQL1yfzFHFr02XdMfQ7jOM4/edit?tab=t.0> > >>>>>> ]. > >>>>>> > >>>>>> The SPIP proposes extending Spark's lazy, declarative execution model > >>>>>> beyond single queries, to pipelines that keep multiple datasets up to > >>>>>> date. > >>>>>> It introduces the ability to compose multiple transformations into a > >>>>>> single > >>>>>> declarative dataflow graph. > >>>>>> > >>>>>> Declarative pipelines aim to simplify the development and management > >>>>>> of data pipelines, by removing the need for manual orchestration of > >>>>>> dependencies and making it possible to catch many errors before any > >>>>>> execution steps are launched. > >>>>>> > >>>>>> Declarative pipelines can include both batch and streaming > >>>>>> computations, leveraging Structured Streaming for stream processing > >>>>>> and new > >>>>>> materialized view syntax for batch processing. Tight integration with > >>>>>> Spark > >>>>>> SQL's analyzer enables deeper analysis and earlier error detection > >>>>>> than is > >>>>>> achievable with more generic frameworks. > >>>>>> > >>>>>> Let us know what you think! > >>>>>> > >>>>>> >
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org