+1 for the flip, the snapshot, changelog related ptfs are nice addition to flink sql!
I have a question about the naming of the SEARCH_KEY, considering that it is a simplified alternative to lookup join, and the word 'search' may be more likely to remind users of keyword searching which is a bit different from joins in sql, would it be better to consider following the naming of lookup, e.g., ``` SELECT * FROM t1, LATERAL LOOKUP( table => dim, key => DESCRIPTOR(k1,k2), t1, t2...) ``` Best, Lincoln Lee Shengkai Fang <fskm...@gmail.com> 于2025年4月15日周二 10:12写道: > Thanks for the FLIP, it helps a lot for us to develop features like > VECTOR_SEARCH. But I have some questions about the FLIP: > > 1. Timing of Option Consumption for SEARCH_KEY Parameters > For the FOR SYSTEM_TIME AS OF syntax, the planner leverages hints and > catalog tables to load the table scan during the toRel phase. However, if > users use the SEARCH_KEY function, is the planner able to consume these > options at the same early stage? > > 2.Handling Column Name Conflicts in SEARCH_KEY Output Schema > What is the output schema behavior for the SEARCH_KEY function when the > left and right tables have columns with conflicting names? > How can users resolve these name conflicts to ensure unambiguous access to > the desired columns? > > 3. Details on Correlated PTFs > Could you elaborate on correlated PTFs? What is the API design for > implementing correlated PTFs? How does a PTF retrieve the required model > and lookup function during execution? > > 4. Specifying Literal Values in SEARCH_KEY Function > How can users include literal values in the SEARCH_KEY function parameters? > The FOR SYSTEM_TIME AS OF syntax allows users to define complex temporal > conditions. Does SEARCH_KEY support equivalent flexibility for specifying > dynamic or literal values in its parameters? > > Best, > Shengkai > > > > Hao Li <h...@confluent.io.invalid> 于2025年4月3日周四 00:12写道: > > > Hi Timo, > > > > Any question I have is what's the SEARCH_KEY result schema you have in > > mind? Can it output multiple rows for every row in the left table or it > > needs to pack the result in a single row as an array? > > > > Thanks, > > Hao > > > > On Mon, Mar 24, 2025 at 10:20 AM Hao Li <h...@confluent.io> wrote: > > > > > Thanks Timo for the FLIP! This is a great improvement to the FLINK sql > > > syntax around tables. I have two clarification questions: > > > > > > 1. For SEARCH_KEY > > > ``` > > > SELECT * > > > FROM > > > t_other, > > > LATERAL SEARCH_KEY( > > > input => t, > > > on_key => DESCRIPTOR(k), > > > lookup => t_other.name, > > > options => MAP[ > > > 'async', 'true', > > > 'retry-predicate', 'lookup_miss', > > > 'retry-strategy', 'fixed_delay', > > > 'fixed-delay'='10s' > > > ] > > > ) > > > ``` > > > Table `t` needs to be an existing `LookupTableSource` [1], right? And > we > > > will rewrite it to `StreamPhysicalLookupJoin` [2] or similar operator > > > during the physical optimization phase. > > > Also to support passing options, we need to extend `LookupContext` [3] > to > > > have a `getOptions` or `getRuntimeOptions` method? > > > > > > 2. For FROM_CHANGELOG > > > ``` > > > SELECT * FROM FROM_CHANGELOG(s) AS t; > > > ``` > > > Do we need to introduce a `DataStream` resource in sql first? > > > > > > > > > Hao > > > > > > > > > > > > > > > [1] > > > > > > > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java > > > [2] > > > > > > > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala#L41 > > > [3] > > > > > > > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L82 > > > > > > > > > On Fri, Mar 21, 2025 at 6:25 AM Timo Walther <twal...@apache.org> > wrote: > > > > > >> Hi everyone, > > >> > > >> I would like to start a discussion about FLIP-517: Better Handling of > > >> Dynamic Table Primitives with PTFs [1]. > > >> > > >> In the past months, I have spent a significant amount of time with SQL > > >> semantics and the SQL standard around PTFs, when designing and > > >> implementing FLIP-440 [2]. For those of you that have not taken a look > > >> into the standard, the concept of Polymorphic Table Functions (PTF) > > >> enables syntax for implementing custom SQL operators. In my opinion, > > >> they are kind of a revolution in the SQL language. PTFs can take > scalar > > >> values, tables, models (in Flink), and column lists as arguments. With > > >> these primitives, we can further evolve shortcomings in the Flink SQL > > >> language by leveraging syntax and semantics. > > >> > > >> I would like introduce a couple of built-in PTFs with the goal to make > > >> the handling of dynamic tables easier for users. Once users understand > > >> how a PTF works, they can easily select from a list of functions to > > >> approach a table for snapshots, changelogs, or searching. > > >> > > >> The FLIP proposes: > > >> > > >> SNAPSHOT() > > >> SEARCH_KEY() > > >> TO_CHANGELOG() > > >> FROM_CHANGELOG() > > >> > > >> I'm aware that this is a delicate topic, and might lead to > controversial > > >> discussions. I hope with concise naming and syntax the benefit over > the > > >> existing syntax becomes clear. > > >> > > >> There are more useful PTFs to come, but those are the ones that I > > >> currently see as the most fundamental ones to tell a round story > around > > >> Flink SQL. > > >> > > >> Looking forward to your feedback. > > >> > > >> Thanks, > > >> Timo > > >> > > >> [1] > > >> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-517%3A+Better+Handling+of+Dynamic+Table+Primitives+with+PTFs > > >> [2] > > >> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 > > >> > > > > > >