Hi, Timo I believe I may have misapplied the query hints and table hints. The current proposal only applies to query hints. For table hints, I think we would need to create a view, apply the table hints to the view, and use it as the SEARCH_KEY function's input.
Best, Shengkai Shengkai Fang <fskm...@gmail.com> 于2025年4月17日周四 21:15写道: > Thanks Timo's explain. > > > Can you give an example for a complex temporal condition? > > For example, consider the following SQL statement: > > SELECT * > FROM src > LEFT JOIN dim FOR SYSTEM_TIME AS OF PROCTIME() > ON src.key = dim.key1 AND dim.key2 = 'literal_value'; > > In this scenario, I am uncertain how to replicate the same semantics using > the SEARCH_KEY syntax. > > > Naming of the SEARCH_KEY > > I think Lincoln prefer to use LOOKUP to replace SEARCH_KEY only. :) > > Best > Shengkai > > > Timo Walther <twal...@apache.org> 于2025年4月17日周四 14:33写道: > >> Hi everyone, >> >> great to see that this discussion gains momentum. Let me answer your >> questions below: >> >> @Hao >> >> 1. SEARCH_KEY planner representation and options >> >> Yes, internally nothing in the planner should change. The PTFs are just >> syntactic sugar to rules and nodes that already exist. Regarding the >> options, we already support passing options via lookup hints. See also >> [1]. The option keys are in sync with the hints, so currently I'm not >> proposing any new functionality. If there is a need, I'm sure we can >> extend the existing interfaces. >> >> 2. Do we need to introduce a `DataStream` resource in SQL first? >> >> Clear no. We want to stay SQL compliant. The result remains a table >> where the changeflag become an additional column in the table. All other >> operations on tables should still be possible. >> >> @Shengkai: >> >> 1. Timing of Option Consumption >> >> The planner is able to consume these options at an early stage. We can >> and should enforce literals at this location, so the map can be accessed >> via type inference. Similar to CallContext#getArgumentValue(). >> >> 2. Column Name Conflicts in SEARCH_KEY >> >> The resolution logic should be similar to SystemTypeInference [2]. Thus, >> conflicting columns can be named `value` and `value0`. >> >> 3. Details on Correlated PTFs >> >> I'm not suggesting a user-defined correlated PTF. Similar to how window >> TVF are "built-in PTFs" (and we added user-defined PTF later), the >> correlated PTFs only exist in planner. They actually already exist today >> if you use a PTF and LATERAL. They will be translated into a >> LogicalTableScan and LogicalCorrelate, but fail at code generation which >> only support regular table functions. >> >> So implementation-wise correlated PTFs are just a >> planner/optimizer/RelNode translation step. I will try to make this >> clearer in the FLIP. >> >> 4. Specifying Literal Values in SEARCH_KEY Function >> >> Can you give an example for a complex temporal condition? >> >> @Lincoln: >> >> 1. Naming of the SEARCH_KEY >> >> I don't have a strong opinion on naming here. My biggest concern was >> that both vector search or full text search are also a LOOKUP into a >> different system. So I wanted to focus on what we are looking for, which >> is KEY, VECTOR, or TEXT. We can also name it LOOKUP_KEY, LOOKUP_VECTOR, >> LOOKUP_TEXT. What do you think? >> >> Thanks, >> Timo >> >> [1] >> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#lookup >> [2] >> >> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/types/inference/SystemTypeInference.java#L240 >> >> On 15.04.25 07:46, Lincoln Lee wrote: >> > +1 for the flip, the snapshot, changelog related ptfs are nice addition >> > to flink sql! >> > >> > I have a question about the naming of the SEARCH_KEY, considering >> > that it is a simplified alternative to lookup join, and the word >> 'search' >> > may be more likely to remind users of keyword searching which is a bit >> > different from joins in sql, would it be better to consider following >> the >> > naming of lookup, e.g., >> > ``` >> > SELECT * >> > FROM >> > t1, >> > LATERAL LOOKUP( >> > table => dim, >> > key => DESCRIPTOR(k1,k2), >> > t1, t2...) >> > ``` >> > >> > >> > Best, >> > Lincoln Lee >> > >> > >> > Shengkai Fang <fskm...@gmail.com> 于2025年4月15日周二 10:12写道: >> > >> >> Thanks for the FLIP, it helps a lot for us to develop features like >> >> VECTOR_SEARCH. But I have some questions about the FLIP: >> >> >> >> 1. Timing of Option Consumption for SEARCH_KEY Parameters >> >> For the FOR SYSTEM_TIME AS OF syntax, the planner leverages hints and >> >> catalog tables to load the table scan during the toRel phase. However, >> if >> >> users use the SEARCH_KEY function, is the planner able to consume these >> >> options at the same early stage? >> >> >> >> 2.Handling Column Name Conflicts in SEARCH_KEY Output Schema >> >> What is the output schema behavior for the SEARCH_KEY function when the >> >> left and right tables have columns with conflicting names? >> >> How can users resolve these name conflicts to ensure unambiguous >> access to >> >> the desired columns? >> >> >> >> 3. Details on Correlated PTFs >> >> Could you elaborate on correlated PTFs? What is the API design for >> >> implementing correlated PTFs? How does a PTF retrieve the required >> model >> >> and lookup function during execution? >> >> >> >> 4. Specifying Literal Values in SEARCH_KEY Function >> >> How can users include literal values in the SEARCH_KEY function >> parameters? >> >> The FOR SYSTEM_TIME AS OF syntax allows users to define complex >> temporal >> >> conditions. Does SEARCH_KEY support equivalent flexibility for >> specifying >> >> dynamic or literal values in its parameters? >> >> >> >> Best, >> >> Shengkai >> >> >> >> >> >> >> >> Hao Li <h...@confluent.io.invalid> 于2025年4月3日周四 00:12写道: >> >> >> >>> Hi Timo, >> >>> >> >>> Any question I have is what's the SEARCH_KEY result schema you have in >> >>> mind? Can it output multiple rows for every row in the left table or >> it >> >>> needs to pack the result in a single row as an array? >> >>> >> >>> Thanks, >> >>> Hao >> >>> >> >>> On Mon, Mar 24, 2025 at 10:20 AM Hao Li <h...@confluent.io> wrote: >> >>> >> >>>> Thanks Timo for the FLIP! This is a great improvement to the FLINK >> sql >> >>>> syntax around tables. I have two clarification questions: >> >>>> >> >>>> 1. For SEARCH_KEY >> >>>> ``` >> >>>> SELECT * >> >>>> FROM >> >>>> t_other, >> >>>> LATERAL SEARCH_KEY( >> >>>> input => t, >> >>>> on_key => DESCRIPTOR(k), >> >>>> lookup => t_other.name, >> >>>> options => MAP[ >> >>>> 'async', 'true', >> >>>> 'retry-predicate', 'lookup_miss', >> >>>> 'retry-strategy', 'fixed_delay', >> >>>> 'fixed-delay'='10s' >> >>>> ] >> >>>> ) >> >>>> ``` >> >>>> Table `t` needs to be an existing `LookupTableSource` [1], right? And >> >> we >> >>>> will rewrite it to `StreamPhysicalLookupJoin` [2] or similar >> operator >> >>>> during the physical optimization phase. >> >>>> Also to support passing options, we need to extend `LookupContext` >> [3] >> >> to >> >>>> have a `getOptions` or `getRuntimeOptions` method? >> >>>> >> >>>> 2. For FROM_CHANGELOG >> >>>> ``` >> >>>> SELECT * FROM FROM_CHANGELOG(s) AS t; >> >>>> ``` >> >>>> Do we need to introduce a `DataStream` resource in sql first? >> >>>> >> >>>> >> >>>> Hao >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> [1] >> >>>> >> >>>> >> >>> >> >> >> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java >> >>>> [2] >> >>>> >> >>>> >> >>> >> >> >> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala#L41 >> >>>> [3] >> >>>> >> >>>> >> >>> >> >> >> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L82 >> >>>> >> >>>> >> >>>> On Fri, Mar 21, 2025 at 6:25 AM Timo Walther <twal...@apache.org> >> >> wrote: >> >>>> >> >>>>> Hi everyone, >> >>>>> >> >>>>> I would like to start a discussion about FLIP-517: Better Handling >> of >> >>>>> Dynamic Table Primitives with PTFs [1]. >> >>>>> >> >>>>> In the past months, I have spent a significant amount of time with >> SQL >> >>>>> semantics and the SQL standard around PTFs, when designing and >> >>>>> implementing FLIP-440 [2]. For those of you that have not taken a >> look >> >>>>> into the standard, the concept of Polymorphic Table Functions (PTF) >> >>>>> enables syntax for implementing custom SQL operators. In my opinion, >> >>>>> they are kind of a revolution in the SQL language. PTFs can take >> >> scalar >> >>>>> values, tables, models (in Flink), and column lists as arguments. >> With >> >>>>> these primitives, we can further evolve shortcomings in the Flink >> SQL >> >>>>> language by leveraging syntax and semantics. >> >>>>> >> >>>>> I would like introduce a couple of built-in PTFs with the goal to >> make >> >>>>> the handling of dynamic tables easier for users. Once users >> understand >> >>>>> how a PTF works, they can easily select from a list of functions to >> >>>>> approach a table for snapshots, changelogs, or searching. >> >>>>> >> >>>>> The FLIP proposes: >> >>>>> >> >>>>> SNAPSHOT() >> >>>>> SEARCH_KEY() >> >>>>> TO_CHANGELOG() >> >>>>> FROM_CHANGELOG() >> >>>>> >> >>>>> I'm aware that this is a delicate topic, and might lead to >> >> controversial >> >>>>> discussions. I hope with concise naming and syntax the benefit over >> >> the >> >>>>> existing syntax becomes clear. >> >>>>> >> >>>>> There are more useful PTFs to come, but those are the ones that I >> >>>>> currently see as the most fundamental ones to tell a round story >> >> around >> >>>>> Flink SQL. >> >>>>> >> >>>>> Looking forward to your feedback. >> >>>>> >> >>>>> Thanks, >> >>>>> Timo >> >>>>> >> >>>>> [1] >> >>>>> >> >>>>> >> >>> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-517%3A+Better+Handling+of+Dynamic+Table+Primitives+with+PTFs >> >>>>> [2] >> >>>>> >> >>> >> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 >> >>>>> >> >>>> >> >>> >> >> >> > >> >>