Hi, Timo

I believe I may have misapplied the query hints and table hints. The
current proposal only applies to query hints. For table hints, I think we
would need to create a view, apply the table hints to the view, and use it
as the SEARCH_KEY function's input.

Best,
Shengkai


Shengkai Fang <fskm...@gmail.com> 于2025年4月17日周四 21:15写道:

> Thanks Timo's explain.
>
> > Can you give an example for a complex temporal condition?
>
> For example, consider the following SQL statement:
>
> SELECT *
> FROM src
> LEFT JOIN dim FOR SYSTEM_TIME AS OF PROCTIME()
> ON src.key = dim.key1 AND dim.key2 = 'literal_value';
>
> In this scenario, I am uncertain how to replicate the same semantics using
> the SEARCH_KEY syntax.
>
> > Naming of the SEARCH_KEY
>
> I think Lincoln prefer to use LOOKUP to replace SEARCH_KEY only. :)
>
> Best
> Shengkai
>
>
> Timo Walther <twal...@apache.org> 于2025年4月17日周四 14:33写道:
>
>> Hi everyone,
>>
>> great to see that this discussion gains momentum. Let me answer your
>> questions below:
>>
>> @Hao
>>
>> 1. SEARCH_KEY planner representation and options
>>
>> Yes, internally nothing in the planner should change. The PTFs are just
>> syntactic sugar to rules and nodes that already exist. Regarding the
>> options, we already support passing options via lookup hints. See also
>> [1]. The option keys are in sync with the hints, so currently I'm not
>> proposing any new functionality. If there is a need, I'm sure we can
>> extend the existing interfaces.
>>
>> 2. Do we need to introduce a `DataStream` resource in SQL first?
>>
>> Clear no. We want to stay SQL compliant. The result remains a table
>> where the changeflag become an additional column in the table. All other
>> operations on tables should still be possible.
>>
>> @Shengkai:
>>
>> 1. Timing of Option Consumption
>>
>> The planner is able to consume these options at an early stage. We can
>> and should enforce literals at this location, so the map can be accessed
>> via type inference. Similar to CallContext#getArgumentValue().
>>
>> 2. Column Name Conflicts in SEARCH_KEY
>>
>> The resolution logic should be similar to SystemTypeInference [2]. Thus,
>> conflicting columns can be named `value` and `value0`.
>>
>> 3. Details on Correlated PTFs
>>
>> I'm not suggesting a user-defined correlated PTF. Similar to how window
>> TVF are "built-in PTFs" (and we added user-defined PTF later), the
>> correlated PTFs only exist in planner. They actually already exist today
>> if you use a PTF and LATERAL. They will be translated into a
>> LogicalTableScan and LogicalCorrelate, but fail at code generation which
>> only support regular table functions.
>>
>> So implementation-wise correlated PTFs are just a
>> planner/optimizer/RelNode translation step. I will try to make this
>> clearer in the FLIP.
>>
>> 4. Specifying Literal Values in SEARCH_KEY Function
>>
>> Can you give an example for a complex temporal condition?
>>
>> @Lincoln:
>>
>> 1. Naming of the SEARCH_KEY
>>
>> I don't have a strong opinion on naming here. My biggest concern was
>> that both vector search or full text search are also a LOOKUP into a
>> different system. So I wanted to focus on what we are looking for, which
>> is KEY, VECTOR, or TEXT. We can also name it LOOKUP_KEY, LOOKUP_VECTOR,
>> LOOKUP_TEXT. What do you think?
>>
>> Thanks,
>> Timo
>>
>> [1]
>>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#lookup
>> [2]
>>
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/types/inference/SystemTypeInference.java#L240
>>
>> On 15.04.25 07:46, Lincoln Lee wrote:
>> > +1 for the flip, the snapshot, changelog related ptfs are nice addition
>> > to flink sql!
>> >
>> > I have a question about the naming of the SEARCH_KEY, considering
>> > that it is a simplified alternative to lookup join, and the word
>> 'search'
>> > may be more likely to remind users of keyword searching which is a bit
>> > different from joins in sql, would it be better to consider following
>> the
>> > naming of lookup, e.g.,
>> > ```
>> > SELECT *
>> > FROM
>> >    t1,
>> >    LATERAL LOOKUP(
>> >      table => dim,
>> >      key => DESCRIPTOR(k1,k2),
>> >      t1, t2...)
>> > ```
>> >
>> >
>> > Best,
>> > Lincoln Lee
>> >
>> >
>> > Shengkai Fang <fskm...@gmail.com> 于2025年4月15日周二 10:12写道:
>> >
>> >> Thanks for the FLIP, it helps a lot for us to develop features like
>> >> VECTOR_SEARCH. But I have some questions about the FLIP:
>> >>
>> >> 1. Timing of Option Consumption for SEARCH_KEY Parameters
>> >> For the FOR SYSTEM_TIME AS OF syntax, the planner leverages hints and
>> >> catalog tables to load the table scan during the toRel phase. However,
>> if
>> >> users use the SEARCH_KEY function, is the planner able to consume these
>> >> options at the same early stage?
>> >>
>> >> 2.Handling Column Name Conflicts in SEARCH_KEY Output Schema
>> >> What is the output schema behavior for the SEARCH_KEY function when the
>> >> left and right tables have columns with conflicting names?
>> >> How can users resolve these name conflicts to ensure unambiguous
>> access to
>> >> the desired columns?
>> >>
>> >> 3. Details on Correlated PTFs
>> >> Could you elaborate on correlated PTFs? What is the API design for
>> >> implementing correlated PTFs? How does a PTF retrieve the required
>> model
>> >> and lookup function during execution?
>> >>
>> >> 4. Specifying Literal Values in SEARCH_KEY Function
>> >> How can users include literal values in the SEARCH_KEY function
>> parameters?
>> >> The FOR SYSTEM_TIME AS OF syntax allows users to define complex
>> temporal
>> >> conditions. Does SEARCH_KEY support equivalent flexibility for
>> specifying
>> >> dynamic or literal values in its parameters?
>> >>
>> >> Best,
>> >> Shengkai
>> >>
>> >>
>> >>
>> >> Hao Li <h...@confluent.io.invalid> 于2025年4月3日周四 00:12写道:
>> >>
>> >>> Hi Timo,
>> >>>
>> >>> Any question I have is what's the SEARCH_KEY result schema you have in
>> >>> mind? Can it output multiple rows for every row in the left table or
>> it
>> >>> needs to pack the result in a single row as an array?
>> >>>
>> >>> Thanks,
>> >>> Hao
>> >>>
>> >>> On Mon, Mar 24, 2025 at 10:20 AM Hao Li <h...@confluent.io> wrote:
>> >>>
>> >>>> Thanks Timo for the FLIP! This is a great improvement to the FLINK
>> sql
>> >>>> syntax around tables. I have two clarification questions:
>> >>>>
>> >>>> 1. For SEARCH_KEY
>> >>>> ```
>> >>>> SELECT *
>> >>>> FROM
>> >>>>    t_other,
>> >>>>    LATERAL SEARCH_KEY(
>> >>>>      input => t,
>> >>>>      on_key => DESCRIPTOR(k),
>> >>>>      lookup => t_other.name,
>> >>>>      options => MAP[
>> >>>>        'async', 'true',
>> >>>>        'retry-predicate', 'lookup_miss',
>> >>>>        'retry-strategy', 'fixed_delay',
>> >>>>        'fixed-delay'='10s'
>> >>>>      ]
>> >>>>    )
>> >>>> ```
>> >>>> Table `t` needs to be an existing `LookupTableSource` [1], right? And
>> >> we
>> >>>> will rewrite it to `StreamPhysicalLookupJoin`  [2] or similar
>> operator
>> >>>> during the physical optimization phase.
>> >>>> Also to support passing options, we need to extend `LookupContext`
>> [3]
>> >> to
>> >>>> have a `getOptions` or `getRuntimeOptions` method?
>> >>>>
>> >>>> 2. For FROM_CHANGELOG
>> >>>> ```
>> >>>> SELECT * FROM FROM_CHANGELOG(s) AS t;
>> >>>> ```
>> >>>> Do we need to introduce a `DataStream` resource in sql first?
>> >>>>
>> >>>>
>> >>>> Hao
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> [1]
>> >>>>
>> >>>>
>> >>>
>> >>
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java
>> >>>> [2]
>> >>>>
>> >>>>
>> >>>
>> >>
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala#L41
>> >>>> [3]
>> >>>>
>> >>>>
>> >>>
>> >>
>> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L82
>> >>>>
>> >>>>
>> >>>> On Fri, Mar 21, 2025 at 6:25 AM Timo Walther <twal...@apache.org>
>> >> wrote:
>> >>>>
>> >>>>> Hi everyone,
>> >>>>>
>> >>>>> I would like to start a discussion about FLIP-517: Better Handling
>> of
>> >>>>> Dynamic Table Primitives with PTFs [1].
>> >>>>>
>> >>>>> In the past months, I have spent a significant amount of time with
>> SQL
>> >>>>> semantics and the SQL standard around PTFs, when designing and
>> >>>>> implementing FLIP-440 [2]. For those of you that have not taken a
>> look
>> >>>>> into the standard, the concept of Polymorphic Table Functions (PTF)
>> >>>>> enables syntax for implementing custom SQL operators. In my opinion,
>> >>>>> they are kind of a revolution in the SQL language. PTFs can take
>> >> scalar
>> >>>>> values, tables, models (in Flink), and column lists as arguments.
>> With
>> >>>>> these primitives, we can further evolve shortcomings in the Flink
>> SQL
>> >>>>> language by leveraging syntax and semantics.
>> >>>>>
>> >>>>> I would like introduce a couple of built-in PTFs with the goal to
>> make
>> >>>>> the handling of dynamic tables easier for users. Once users
>> understand
>> >>>>> how a PTF works, they can easily select from a list of functions to
>> >>>>> approach a table for snapshots, changelogs, or searching.
>> >>>>>
>> >>>>> The FLIP proposes:
>> >>>>>
>> >>>>> SNAPSHOT()
>> >>>>> SEARCH_KEY()
>> >>>>> TO_CHANGELOG()
>> >>>>> FROM_CHANGELOG()
>> >>>>>
>> >>>>> I'm aware that this is a delicate topic, and might lead to
>> >> controversial
>> >>>>> discussions. I hope with concise naming and syntax the benefit over
>> >> the
>> >>>>> existing syntax becomes clear.
>> >>>>>
>> >>>>> There are more useful PTFs to come, but those are the ones that I
>> >>>>> currently see as the most fundamental ones to tell a round story
>> >> around
>> >>>>> Flink SQL.
>> >>>>>
>> >>>>> Looking forward to your feedback.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Timo
>> >>>>>
>> >>>>> [1]
>> >>>>>
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-517%3A+Better+Handling+of+Dynamic+Table+Primitives+with+PTFs
>> >>>>> [2]
>> >>>>>
>> >>>
>> >>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>>

Reply via email to