Hi Danny, Thanks for the updates. I have 2 comments regarding to latest document:
1) I think we also need `*supportedHintOptions*` for `*TableFormatFactory*` 2) IMO "dynamic-table-options.enabled" should belong to ` *OptimizerConfigOptions*` Best, Kurt On Thu, Mar 26, 2020 at 4:40 PM Timo Walther <twal...@apache.org> wrote: > Thanks for the update Danny. +1 for this proposal. > > Regards, > Timo > > On 26.03.20 04:51, Danny Chan wrote: > > Thanks everyone who engaged in this discussion ~ > > > > Our goal is "Supports Dynamic Table Options for Flink SQL". After an > > offline discussion with Kurt, Timo and Dawid, we have made the final > > conclusion, here is the summary: > > > > > > - Use comment style syntax to specify the dynamic table options: "/*+ > > *OPTIONS*(k1='v1', k2='v2') */" > > - Have constraint on the options keys: the options that may bring in > > security problems should not be allowed, i.e. Kafka connector > zookeeper > > endpoint URL and topic name > > - Use white-list to control the allowed options for each connector, > > which is more safe for future extention > > - We allow to enable/disable this feature globally > > - Implement based on the current code base first, and when FLIP-95 is > > checked in, implement this feature based on new interface > > > > Any suggestions are appreciated ~ > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL > > > > Best, > > Danny Chan > > > > Jark Wu <imj...@gmail.com> 于2020年3月18日周三 下午10:38写道: > > > >> Hi everyone, > >> > >> Sorry, but I'm not sure about the `supportedHintOptions`. I'm afraid it > >> doesn't solve the problems but increases some development and learning > >> burdens. > >> > >> # increase development and learning burden > >> > >> According to the discussion so far, we want to support overriding a > subset > >> of options in hints which doesn't affect semantics. > >> With the `supportedHintOptions`, it's up to the connector developers to > >> decide which options will not affect semantics, and to be hint options. > >> However, the question is how to distinguish whether an option will > *affect > >> semantics*? What happens if an option will affect semantics but > provided as > >> hint options? > >> From my point of view, it's not easy to distinguish. For example, the > >> "format.ignore-parse-error" can be a very useful dynamic option but that > >> will affect semantic, because the result is different (null vs > exception). > >> Another example, the "connector.lookup.cache.*" options are also very > >> useful to tune jobs, however, it will also affect the job results. I can > >> come up many more useful options but may affect semantics. > >> > >> I can see that the community will under endless discussion around "can > this > >> option to be a hint option?", "wether this option will affect > semantics?". > >> You can also find that we already have different opinions on > >> "ignore-parse-error". Those discussion is a waste of time! That's not > what > >> users want! > >> The problem is user need this, this, this options and HOW to expose > them? > >> We should focus on that. > >> > >> Then there could be two endings in the future: > >> 1) compromise on the usability, we drop the rule that hints don't affect > >> semantics, allow all the useful options in the hints list. > >> 2) stick on the rule, users will find this is a stumbling feature which > >> doesn't solve their problems. > >> And they will be surprised why this option can't be set, but the > other > >> could. *semantic* is hard to be understood by users. > >> > >> # doesn't solve the problems > >> > >> I think the purpose of this FLIP is to allow users to quickly override > some > >> connectors' properties to tune their jobs. > >> However, `supportedHintOptions` is off track. It only allows a subset > >> options and for the users it's not *clear* which subset is allowed. > >> > >> Besides, I'm not sure `supportedHintOptions` can work well for all > cases. > >> How could you support kafka properties (`connector.properties.*`) as > hint > >> options? Some kafka properties may affect semantics (bootstrap.servers), > >> some may not (max.poll.records). Besides, I think it's not possible to > list > >> all the possible kafka properties [1]. > >> > >> In summary, IMO, `supportedHintOptions` > >> (1) it increase the complexity to develop a connector > >> (2) it confuses users which options can be used in hint, which are not, > >> they have to check the docs again and again. > >> (3) it doesn't solve the problems which we want to solve by this FLIP. > >> > >> I think we should avoid introducing some partial solutions. Otherwise, > we > >> will be stuck in a loop that introduce new API -> deprecate API -> > >> introduce new API.... > >> > >> I personally in favor of an explicit WITH syntax after the table as a > part > >> of the query which is mentioned by Kurt before, e.g. SELECT * from T > >> WITH('key' = 'value') . > >> It allows users to dynamically set options which can affect semantics. > It > >> will be very flexible to solve users' problems so far. > >> > >> Best, > >> Jark > >> > >> [1]: https://kafka.apache.org/documentation/#consumerconfigs > >> > >> On Wed, 18 Mar 2020 at 21:44, Danny Chan <yuzhao....@gmail.com> wrote: > >> > >>> My POC is here for the hints options merge [1]. > >>> > >>> Personally, I have no strong objections for splitting hints with the > >>> CatalogTable, the only cons is a more complex implementation but the > >>> concept is more clear, and I have updated the WIKI. > >>> > >>> I think it would be nice if we can support the format “ignore-parse > >> error” > >>> option key, the CSV source already has a key [2] and we can use that in > >> the > >>> supportedHIntOptions, for the common CSV and JSON formats, we cal also > >> give > >>> a support. This is the only kind of key in formats that “do not change > >> the > >>> semantics” (somehow), what do you think about this ~ > >>> > >>> [1] > >>> > >> > https://github.com/danny0405/flink/commit/5d925fa16c3c553423c4b7d93001521b8e6e6bee#diff-6e569a6dd124fd2091c18e2790fb49c5 > >>> [2] > >>> > >> > https://github.com/apache/flink/blob/b83060dff6d403b6994b6646b3f29a374f599530/flink-table/flink-table-api-java-bridge/src/main/java/org/apache/flink/table/sources/CsvTableSourceFactoryBase.java#L92 > >>> > >>> Best, > >>> Danny Chan > >>> 在 2020年3月18日 +0800 PM9:10,Timo Walther <twal...@apache.org>,写道: > >>>> Hi everyone, > >>>> > >>>> +1 to Kurt's suggestion. Let's just have it in source and sink > >> factories > >>>> for now. We can still move this method up in the future. Currently, I > >>>> don't see a need for catalogs or formats. Because how would you target > >> a > >>>> format in the query? > >>>> > >>>> @Danny: Can you send a link to your PoC? I'm very skeptical about > >>>> creating a new CatalogTable in planner. Actually CatalogTable should > be > >>>> immutable between Catalog and Factory. Because a catalog can return > its > >>>> own factory and fully control the instantiation. Depending on the > >>>> implementation, that means it can be possible that the catalog has > >>>> encoded more information in a concrete subclass implementing the > >>>> interface. I vote for separating the concerns of catalog information > >> and > >>>> hints in the factory explicitly. > >>>> > >>>> Regards, > >>>> Timo > >>>> > >>>> > >>>> On 18.03.20 05:41, Jingsong Li wrote: > >>>>> Hi, > >>>>> > >>>>> I am thinking we can provide hints to *table* related instances. > >>>>> - TableFormatFactory: of cause we need hints support, there are many > >>> format > >>>>> options in DDL too. > >>>>> - catalog and module: I don't know, maybe in future we can provide > >> some > >>>>> hints for them. > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> On Wed, Mar 18, 2020 at 12:28 PM Danny Chan <yuzhao....@gmail.com> > >>> wrote: > >>>>> > >>>>>> Yes, I think we should move the `supportedHintOptions` from > >>> TableFactory > >>>>>> to TableSourceFactory, and we also need to add the interface to > >>>>>> TableSinkFactory though because sink target table may also have > >> hints > >>>>>> attached. > >>>>>> > >>>>>> Best, > >>>>>> Danny Chan > >>>>>> 在 2020年3月18日 +0800 AM11:08,Kurt Young <ykt...@gmail.com>,写道: > >>>>>>> Have one question for adding `supportedHintOptions` method to > >>>>>>> `TableFactory`. It seems > >>>>>>> `TableFactory` is a base factory interface for all *table module* > >>> related > >>>>>>> instances, such as > >>>>>>> catalog, module, format and so on. It's not created only for > >>> *table*. Is > >>>>>> it > >>>>>>> possible to move it > >>>>>>> to `TableSourceFactory`? > >>>>>>> > >>>>>>> Best, > >>>>>>> Kurt > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Mar 18, 2020 at 10:59 AM Danny Chan < > >> yuzhao....@gmail.com> > >>>>>> wrote: > >>>>>>> > >>>>>>>> Thanks Timo ~ > >>>>>>>> > >>>>>>>> For the naming itself, I also think the PROPERTIES is not that > >>>>>> concise, so > >>>>>>>> +1 for OPTIONS (I had thought about that, but there are many > >>> codes in > >>>>>>>> current Flink called it properties, i.e. the > >>> DescriptorProperties, > >>>>>>>> #getSupportedProperties), let’s use OPTIONS if this is our new > >>>>>> preference. > >>>>>>>> > >>>>>>>> +1 to `Set<ConfigOption> supportedHintOptions()` because the > >>>>>> ConfigOption > >>>>>>>> can take more info. AFAIK, Spark also call their table options > >>> instead > >>>>>> of > >>>>>>>> properties. [1] > >>>>>>>> > >>>>>>>> In my local POC, I did create a new CatalogTable, and it works > >>> for > >>>>>> current > >>>>>>>> connectors well, all the DDL tables would finally yield a > >>> CatalogTable > >>>>>>>> instance and we can apply the options to that(in the > >>> CatalogSourceTable > >>>>>>>> when we generating the TableSource), the pros is that we do not > >>> need to > >>>>>>>> modify the codes of connectors itself. If we split the options > >>> from > >>>>>>>> CatalogTable, we may need to add some additional logic in each > >>>>>> connector > >>>>>>>> factories in order to merge these properties (and the logic are > >>> almost > >>>>>> the > >>>>>>>> same), what do you think about this? > >>>>>>>> > >>>>>>>> [1] > >>>>>>>> > >>>>>> > >>> > >> > https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-table.html > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Danny Chan > >>>>>>>> 在 2020年3月17日 +0800 PM10:10,Timo Walther <twal...@apache.org > >>> ,写道: > >>>>>>>>> Hi Danny, > >>>>>>>>> > >>>>>>>>> thanks for updating the FLIP. I think your current design is > >>>>>> sufficient > >>>>>>>>> to separate hints from result-related properties. > >>>>>>>>> > >>>>>>>>> One remark to the naming itself: I would vote for calling the > >>> hints > >>>>>>>>> around table scan `OPTIONS('k'='v')`. We used the term > >>> "properties" > >>>>>> in > >>>>>>>>> the past but since we want to unify the Flink configuration > >>>>>> experience, > >>>>>>>>> we should use consistent naming and classes around > >>> `ConfigOptions`. > >>>>>>>>> > >>>>>>>>> It would be nice to use `Set<ConfigOption> > >>> supportedHintOptions();` > >>>>>> to > >>>>>>>>> start using config options instead of pure string properties. > >>> This > >>>>>> will > >>>>>>>>> also allow us to generate documentation in the future around > >>>>>> supported > >>>>>>>>> data types, ranges, etc. for options. At some point we would > >>> also > >>>>>> like > >>>>>>>>> to drop `DescriptorProperties` class. "Options" is also used > >>> in the > >>>>>>>>> documentation [1] and in the SQL/MED standard [2]. > >>>>>>>>> > >>>>>>>>> Furthermore, I would still vote for separating CatalogTable > >>> and hint > >>>>>>>>> options. Otherwise the planner would need to create a new > >>>>>> CatalogTable > >>>>>>>>> instance which might not always be easy. We should offer them > >>> via: > >>>>>>>>> > >>>>>>>>> > >>> org.apache.flink.table.factories.TableSourceFactory.Context#getHints: > >>>>>>>>> ReadableConfig > >>>>>>>>> > >>>>>>>>> What do you think? > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> [1] > >>>>>>>>> > >>>>>>>> > >>>>>> > >>> > >> > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/create.html#create-table > >>>>>>>>> [2] https://wiki.postgresql.org/wiki/SQL/MED > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 12.03.20 15:06, Stephan Ewen wrote: > >>>>>>>>>> @Danny sounds good. > >>>>>>>>>> > >>>>>>>>>> Maybe it is worth listing all the classes of problems that > >>> you > >>>>>> want to > >>>>>>>>>> address and then look at each class and see if hints are a > >>> good > >>>>>> default > >>>>>>>>>> solution or a good optional way of simplifying things? > >>>>>>>>>> The discussion has grown a lot and it is starting to be > >> hard > >>> to > >>>>>>>> distinguish > >>>>>>>>>> the parts where everyone agrees from the parts were there > >> are > >>>>>> concerns. > >>>>>>>>>> > >>>>>>>>>> On Thu, Mar 12, 2020 at 2:31 PM Danny Chan < > >>> danny0...@apache.org> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Thanks Stephan ~ > >>>>>>>>>>> > >>>>>>>>>>> We can remove the support for properties that may change > >>> the > >>>>>>>> semantics of > >>>>>>>>>>> query if you think that is a trouble. > >>>>>>>>>>> > >>>>>>>>>>> How about we support the /*+ properties() */ hint only > >> for > >>> those > >>>>>>>> optimize > >>>>>>>>>>> parameters, such as the fetch size of source or something > >>> like > >>>>>> that, > >>>>>>>> does > >>>>>>>>>>> that make sense? > >>>>>>>>>>> > >>>>>>>>>>> Stephan Ewen <se...@apache.org>于2020年3月12日 周四下午7:45写道: > >>>>>>>>>>> > >>>>>>>>>>>> I think Bowen has actually put it very well. > >>>>>>>>>>>> > >>>>>>>>>>>> (1) Hints that change semantics looks like trouble > >>> waiting to > >>>>>>>> happen. For > >>>>>>>>>>>> example Kafka offset handling should be in filters. The > >>> Kafka > >>>>>>>> source > >>>>>>>>>>> should > >>>>>>>>>>>> support predicate pushdown. > >>>>>>>>>>>> > >>>>>>>>>>>> (2) Hints should not be a workaround for current > >>> shortcomings. > >>>>>> A > >>>>>>>> lot of > >>>>>>>>>>> the > >>>>>>>>>>>> suggested above sounds exactly like that. Working > >> around > >>>>>>>> catalog/DDL > >>>>>>>>>>>> shortcomings, missing exposure of metadata (offsets), > >>> missing > >>>>>>>> predicate > >>>>>>>>>>>> pushdown in Kafka. Abusing a feature like hints now as > >> a > >>> quick > >>>>>> fix > >>>>>>>> for > >>>>>>>>>>>> these issues, rather than fixing the root causes, will > >>> much > >>>>>> likely > >>>>>>>> bite > >>>>>>>>>>> us > >>>>>>>>>>>> back badly in the future. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Stephan > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Mar 12, 2020 at 10:43 AM Kurt Young < > >>> ykt...@gmail.com> > >>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> It seems this FLIP's name is somewhat misleading. > >> From > >>> my > >>>>>>>>>>> understanding, > >>>>>>>>>>>>> this FLIP is trying to > >>>>>>>>>>>>> address the dynamic parameter issue, and table hints > >>> is the > >>>>>> way > >>>>>>>> we wan > >>>>>>>>>>> to > >>>>>>>>>>>>> choose. I think we should > >>>>>>>>>>>>> be focus on "what's the right way to solve dynamic > >>> property" > >>>>>>>> instead of > >>>>>>>>>>>>> discussing "whether table > >>>>>>>>>>>>> hints can affect query semantics". > >>>>>>>>>>>>> > >>>>>>>>>>>>> For now, there are two proposed ways to achieve > >> dynamic > >>>>>> property: > >>>>>>>>>>>>> 1. FLIP-110: create temporary table xx like xx with > >>> (xxx) > >>>>>>>>>>>>> 2. use custom "from t with (xxx)" syntax > >>>>>>>>>>>>> 3. "Borrow" the table hints to have a special > >>> PROPERTIES > >>>>>> hint. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The first one didn't break anything, but the only > >>> problem i > >>>>>> see > >>>>>>>> is a > >>>>>>>>>>>> little > >>>>>>>>>>>>> more verbose than the table hint > >>>>>>>>>>>>> approach. I can imagine when someone using SQL CLI to > >>> have a > >>>>>> sql > >>>>>>>>>>>>> experience, it's quite often that > >>>>>>>>>>>>> he will modify the table property, some use cases i > >> can > >>>>>> think of: > >>>>>>>>>>>>> 1. the source contains some corrupted data, i want to > >>> turn > >>>>>> on the > >>>>>>>>>>>>> "ignore-error" flag for certain formats. > >>>>>>>>>>>>> 2. I have a kafka table and want to see some sample > >>> data > >>>>>> from the > >>>>>>>>>>>>> beginning, so i change the offset > >>>>>>>>>>>>> to "earliest", and then I want to observe the latest > >>> data > >>>>>> which > >>>>>>>> keeps > >>>>>>>>>>>>> coming in. I would write another query > >>>>>>>>>>>>> to select from the latest table. > >>>>>>>>>>>>> 3. I want to my jdbc sink flush data more eagerly > >> then > >>> i can > >>>>>>>> observe > >>>>>>>>>>> the > >>>>>>>>>>>>> data from database side. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Most of such use cases are quite ad-hoc. If every > >> time > >>> I > >>>>>> want to > >>>>>>>> have a > >>>>>>>>>>>>> different experience, i need to create > >>>>>>>>>>>>> a temporary table and then also modify my query, it > >>> doesn't > >>>>>> feel > >>>>>>>>>>> smooth. > >>>>>>>>>>>>> Embed such dynamic property into > >>>>>>>>>>>>> query would have better user experience. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Both 2 & 3 can make this happen. The cons of #2 is > >>> breaking > >>>>>> SQL > >>>>>>>>>>>> compliant, > >>>>>>>>>>>>> and for #3, it only breaks some > >>>>>>>>>>>>> unwritten rules, but we can have an explanation on > >>> that. And > >>>>>> I > >>>>>>>> really > >>>>>>>>>>>> doubt > >>>>>>>>>>>>> whether user would complain about > >>>>>>>>>>>>> this when they actually have flexible and good > >>> experience > >>>>>> using > >>>>>>>> this. > >>>>>>>>>>>>> > >>>>>>>>>>>>> My tendency would be #3 > #1 > #2, what do you think? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Kurt > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Mar 12, 2020 at 1:11 PM Danny Chan < > >>>>>> yuzhao....@gmail.com > >>>>>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks Aljoscha ~ > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I agree for most of the query hints, they are > >>> optional as > >>>>>> an > >>>>>>>>>>> optimizer > >>>>>>>>>>>>>> instruction, especially for the traditional RDBMS. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> But, just like BenChao said, Flink as a computation > >>> engine > >>>>>> has > >>>>>>>> many > >>>>>>>>>>>>>> different kind of data sources, thus, dynamic > >>> parameters > >>>>>> like > >>>>>>>>>>>>> start_offest > >>>>>>>>>>>>>> can only bind to each table scope, we can not set a > >>> session > >>>>>>>> config > >>>>>>>>>>> like > >>>>>>>>>>>>>> KSQL because they are all about Kafka: > >>>>>>>>>>>>>>> SET ‘auto.offset.reset’=‘earliest’; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thus the most flexible way to set up these dynamic > >>> params > >>>>>> is > >>>>>>>> to bind > >>>>>>>>>>> to > >>>>>>>>>>>>>> the table scope in the query when we want to > >> override > >>>>>>>> something, so > >>>>>>>>>>> we > >>>>>>>>>>>>> have > >>>>>>>>>>>>>> these solutions above (with pros and cons from my > >>> side): > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> • 1. Select * from t(offset=123) (from Timo) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>> - Easy to add > >>>>>>>>>>>>>> - Parameters are part of the main query > >>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>> - Not SQL compliant > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> • 2. Select * from t /*+ PROPERTIES(offset=123) */ > >>> (from > >>>>>> me) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>> - Easy to add > >>>>>>>>>>>>>> - SQL compliant because it is nested in the > >> comments > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>> - Parameters are not part of the main query > >>>>>>>>>>>>>> - Cryptic syntax for new users > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The biggest problem for hints way may be the “if > >>> hints > >>>>>> must be > >>>>>>>>>>>> optional”, > >>>>>>>>>>>>>> actually we have though about 1 for a while but > >>> aborted > >>>>>>>> because it > >>>>>>>>>>>> breaks > >>>>>>>>>>>>>> the SQL standard too much. And we replace it with > >> 2, > >>>>>> because > >>>>>>>> the > >>>>>>>>>>> hints > >>>>>>>>>>>>>> syntax do not break SQL standard(nested in > >> comments). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> What if we have the special /*+ PROPERTIES */ hint > >>> that > >>>>>> allows > >>>>>>>>>>> override > >>>>>>>>>>>>>> some properties of table dynamically, it does not > >>> break > >>>>>>>> anything, at > >>>>>>>>>>>>> lease > >>>>>>>>>>>>>> for current Flink use cases. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Planner hints are optional just because they are > >>> naturally > >>>>>>>> enforcers > >>>>>>>>>>> of > >>>>>>>>>>>>>> the planner, most of them aim to instruct the > >>> optimizer, > >>>>>> but, > >>>>>>>> the > >>>>>>>>>>> table > >>>>>>>>>>>>>> hints is a little different, table hints can > >> specify > >>> the > >>>>>> table > >>>>>>>> meta > >>>>>>>>>>>> like > >>>>>>>>>>>>>> index column, and it is very convenient to specify > >>> table > >>>>>>>> properties. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Or shall we not call /*+ PROPERTIES(offset=123) */ > >>> table > >>>>>> hint, > >>>>>>>> we > >>>>>>>>>>> can > >>>>>>>>>>>>>> call it table dynamic parameters. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM9:20,Aljoscha Krettek < > >>>>>>>> aljos...@apache.org>,写道: > >>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I don't understand this discussion. Hints, as I > >>>>>> understand > >>>>>>>> them, > >>>>>>>>>>>> should > >>>>>>>>>>>>>>> work like this: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> - hints are *optional* advice for the optimizer > >> to > >>> try > >>>>>> and > >>>>>>>> help it > >>>>>>>>>>> to > >>>>>>>>>>>>>>> find a good execution strategy > >>>>>>>>>>>>>>> - hints should not change query semantics, i.e. > >>> they > >>>>>> should > >>>>>>>> not > >>>>>>>>>>>> change > >>>>>>>>>>>>>>> connector properties executing a query with > >> taking > >>> into > >>>>>>>> account the > >>>>>>>>>>>>>>> hints *must* produce the same result as executing > >>> the > >>>>>> query > >>>>>>>> without > >>>>>>>>>>>>>>> taking into account the hints > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> From these simple requirements you can derive a > >>> solution > >>>>>>>> that makes > >>>>>>>>>>>>>>> sense. I don't have a strong preference for the > >>> syntax > >>>>>> but we > >>>>>>>>>>> should > >>>>>>>>>>>>>>> strive to be in line with prior work. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Aljoscha > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 11.03.20 11:53, Danny Chan wrote: > >>>>>>>>>>>>>>>> Thanks Timo for summarize the 3 options ~ > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I agree with Kurt that option2 is too > >>> complicated to > >>>>>> use > >>>>>>>> because: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> • As a Kafka topic consumer, the user must > >>> define both > >>>>>> the > >>>>>>>>>>> virtual > >>>>>>>>>>>>>> column for start offset and he must apply a special > >>> filter > >>>>>>>> predicate > >>>>>>>>>>>>> after > >>>>>>>>>>>>>> each query > >>>>>>>>>>>>>>>> • And for the internal implementation, the > >>> metadata > >>>>>> column > >>>>>>>> push > >>>>>>>>>>>> down > >>>>>>>>>>>>>> is another hard topic, each kind of message queue > >>> may have > >>>>>> its > >>>>>>>> offset > >>>>>>>>>>>>>> attribute, we need to consider the expression type > >>> for > >>>>>>>> different > >>>>>>>>>>> kind; > >>>>>>>>>>>>> the > >>>>>>>>>>>>>> source also need to recognize the constant column > >> as > >>> a > >>>>>> config > >>>>>>>>>>>>> option(which > >>>>>>>>>>>>>> is weird because usually what we pushed down is a > >>> table > >>>>>> column) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> For option 1 and option3, I think there is no > >>>>>> difference, > >>>>>>>> option1 > >>>>>>>>>>>> is > >>>>>>>>>>>>>> also a hint syntax which is introduced in Sybase > >> and > >>>>>>>> referenced then > >>>>>>>>>>>>>> deprecated by MS-SQL in 199X years because of the > >>>>>>>> ambitiousness. > >>>>>>>>>>>>> Personally > >>>>>>>>>>>>>> I prefer /*+ */ style table hint than WITH keyword > >>> for > >>>>>> these > >>>>>>>> reasons: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> • We do not break the standard SQL, the hints > >> are > >>>>>> nested > >>>>>>>> in SQL > >>>>>>>>>>>>>> comments > >>>>>>>>>>>>>>>> • We do not need to introduce additional WITH > >>> keyword > >>>>>>>> which may > >>>>>>>>>>>>> appear > >>>>>>>>>>>>>> in a query if we use that because a table can be > >>>>>> referenced in > >>>>>>>> all > >>>>>>>>>>>> kinds > >>>>>>>>>>>>> of > >>>>>>>>>>>>>> SQL contexts: INSERT/DELETE/FROM/JOIN …. That would > >>> make > >>>>>> our > >>>>>>>> sql > >>>>>>>>>>> query > >>>>>>>>>>>>>> break too much of the SQL from standard > >>>>>>>>>>>>>>>> • We would have uniform syntax for hints as > >> query > >>>>>> hint, one > >>>>>>>>>>> syntax > >>>>>>>>>>>>>> fits all and more easy to use > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> And here is the reason why we choose a uniform > >>> Oracle > >>>>>>>> style query > >>>>>>>>>>>>>> hint syntax which is addressed by Julian Hyde when > >> we > >>>>>> design > >>>>>>>> the > >>>>>>>>>>> syntax > >>>>>>>>>>>>>> from the Calcite community: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I don’t much like the MSSQL-style syntax for > >>> table > >>>>>> hints. > >>>>>>>> It > >>>>>>>>>>> adds a > >>>>>>>>>>>>>> new use of the WITH keyword that is unrelated to > >> the > >>> use of > >>>>>>>> WITH for > >>>>>>>>>>>>>> common-table expressions. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> A historical note. Microsoft SQL Server > >>> inherited its > >>>>>> hint > >>>>>>>> syntax > >>>>>>>>>>>>> from > >>>>>>>>>>>>>> Sybase a very long time ago. (See “Transact SQL > >>>>>>>> Programming”[1], page > >>>>>>>>>>>>> 632, > >>>>>>>>>>>>>> “Optimizer hints”. The book was written in 1999, > >> and > >>> covers > >>>>>>>> Microsoft > >>>>>>>>>>>> SQL > >>>>>>>>>>>>>> Server 6.5 / 7.0 and Sybase Adaptive Server 11.5, > >>> but the > >>>>>>>> syntax very > >>>>>>>>>>>>>> likely predates Sybase 4.3, from which Microsoft > >> SQL > >>>>>> Server was > >>>>>>>>>>> forked > >>>>>>>>>>>> in > >>>>>>>>>>>>>> 1993.) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Microsoft later added the WITH keyword to make > >>> it less > >>>>>>>> ambiguous, > >>>>>>>>>>>> and > >>>>>>>>>>>>>> has now deprecated the syntax that does not use > >> WITH. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> They are forced to keep the syntax for > >> backwards > >>>>>>>> compatibility > >>>>>>>>>>> but > >>>>>>>>>>>>>> that doesn’t mean that we should shoulder their > >>> burden. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I think formatted comments are the right > >>> container for > >>>>>>>> hints > >>>>>>>>>>>> because > >>>>>>>>>>>>>> it allows us to change the hint syntax without > >>> changing > >>>>>> the SQL > >>>>>>>>>>> parser, > >>>>>>>>>>>>> and > >>>>>>>>>>>>>> makes clear that we are at liberty to ignore hints > >>>>>> entirely. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Julian > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> [1] https://www.amazon.com/s?k=9781565924017 < > >>>>>>>>>>>>>> https://www.amazon.com/s?k=9781565924017> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM4:03,Timo Walther < > >>>>>> twal...@apache.org > >>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> it is true that our DDL is not standard > >>> compliant by > >>>>>>>> using the > >>>>>>>>>>>> WITH > >>>>>>>>>>>>>>>>> clause. Nevertheless, we aim for not > >> diverging > >>> too > >>>>>> much > >>>>>>>> and the > >>>>>>>>>>>>> LIKE > >>>>>>>>>>>>>>>>> clause is an example of that. It will solve > >>> things > >>>>>> like > >>>>>>>>>>>> overwriting > >>>>>>>>>>>>>>>>> WATERMARKs, add additional/modifying > >>> properties and > >>>>>>>> inherit > >>>>>>>>>>>> schema. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Bowen is right that Flink's DDL is mixing 3 > >>> types > >>>>>>>> definition > >>>>>>>>>>>>>> together. > >>>>>>>>>>>>>>>>> We are not the first ones that try to solve > >>> this. > >>>>>> There > >>>>>>>> is also > >>>>>>>>>>>> the > >>>>>>>>>>>>>> SQL > >>>>>>>>>>>>>>>>> MED standard [1] that tried to tackle this > >>> problem. I > >>>>>>>> think it > >>>>>>>>>>>> was > >>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>> considered when designing the current DDL. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Currently, I see 3 options for handling Kafka > >>>>>> offsets. I > >>>>>>>> will > >>>>>>>>>>>> give > >>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>>> examples and look forward to feedback here: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> *Option 1* Runtime and semantic parms as part > >>> of the > >>>>>>>> query > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> `SELECT * FROM MyTable('offset'=123)` > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>>>>> - Easy to add > >>>>>>>>>>>>>>>>> - Parameters are part of the main query > >>>>>>>>>>>>>>>>> - No complicated hinting syntax > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>>>>> - Not SQL compliant > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> *Option 2* Use metadata in query > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> `CREATE TABLE MyTable (id INT, offset AS > >>>>>>>>>>>>> SYSTEM_METADATA('offset'))` > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> `SELECT * FROM MyTable WHERE offset > > >> TIMESTAMP > >>>>>>>> '2012-12-12 > >>>>>>>>>>>>>> 12:34:22'` > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>>>>> - SQL compliant in the query > >>>>>>>>>>>>>>>>> - Access of metadata in the DDL which is > >>> required > >>>>>> anyway > >>>>>>>>>>>>>>>>> - Regular pushdown rules apply > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>>>>> - Users need to add an additional comlumn in > >>> the DDL > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> *Option 3*: Use hints for properties > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> ` > >>>>>>>>>>>>>>>>> SELECT * > >>>>>>>>>>>>>>>>> FROM MyTable /*+ PROPERTIES('offset'=123) */ > >>>>>>>>>>>>>>>>> ` > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Pros: > >>>>>>>>>>>>>>>>> - Easy to add > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Cons: > >>>>>>>>>>>>>>>>> - Parameters are not part of the main query > >>>>>>>>>>>>>>>>> - Cryptic syntax for new users > >>>>>>>>>>>>>>>>> - Not standard compliant. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If we go with this option, I would suggest to > >>> make it > >>>>>>>> available > >>>>>>>>>>>> in > >>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>> separate map and don't mix it with statically > >>> defined > >>>>>>>>>>> properties. > >>>>>>>>>>>>>> Such > >>>>>>>>>>>>>>>>> that the factory can decide which properties > >>> have the > >>>>>>>> right to > >>>>>>>>>>> be > >>>>>>>>>>>>>>>>> overwritten by the hints: > >>>>>>>>>>>>>>>>> TableSourceFactory.Context.getQueryHints(): > >>>>>>>> ReadableConfig > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> [1] https://en.wikipedia.org/wiki/SQL/MED > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Currently I see 3 options as a > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On 11.03.20 07:21, Danny Chan wrote: > >>>>>>>>>>>>>>>>>> Thanks Bowen ~ > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I agree we should somehow categorize our > >>> connector > >>>>>>>>>>> parameters. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> For type1, I’m already preparing a solution > >>> like > >>>>>> the > >>>>>>>>>>> Confluent > >>>>>>>>>>>>>> schema registry + Avro schema inference thing, so > >>> this may > >>>>>> not > >>>>>>>> be a > >>>>>>>>>>>>> problem > >>>>>>>>>>>>>> in the near future. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> For type3, I have some questions: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> "SELECT * FROM mykafka WHERE offset > > >> 12pm > >>>>>> yesterday” > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Where does the offset column come from, a > >>> virtual > >>>>>>>> column from > >>>>>>>>>>>> the > >>>>>>>>>>>>>> table schema, you said that > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> They change > >>>>>>>>>>>>>>>>>> almost every time a query starts and have > >>> nothing > >>>>>> to > >>>>>>>> do with > >>>>>>>>>>>>>> metadata, thus > >>>>>>>>>>>>>>>>>> should not be part of table definition/DDL > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> But why you can reference it in the query, > >>> I’m > >>>>>>>> confused for > >>>>>>>>>>>> that, > >>>>>>>>>>>>>> can you elaborate a little ? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>> 在 2020年3月11日 +0800 PM12:52,Bowen Li < > >>>>>>>> bowenl...@gmail.com > >>>>>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>> Thanks Danny for kicking off the effort > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> The root cause of too much manual work is > >>> Flink > >>>>>> DDL > >>>>>>>> has > >>>>>>>>>>>> mixed 3 > >>>>>>>>>>>>>> types of > >>>>>>>>>>>>>>>>>>> params together and doesn't handle each > >> of > >>> them > >>>>>> very > >>>>>>>> well. > >>>>>>>>>>>>> Below > >>>>>>>>>>>>>> are how I > >>>>>>>>>>>>>>>>>>> categorize them and corresponding > >>> solutions in my > >>>>>>>> mind: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> - type 1: Metadata of external data, like > >>>>>> external > >>>>>>>>>>>>> endpoint/url, > >>>>>>>>>>>>>>>>>>> username/pwd, schemas, formats. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Such metadata are mostly already > >>> accessible in > >>>>>>>> external > >>>>>>>>>>>> system > >>>>>>>>>>>>>> as long as > >>>>>>>>>>>>>>>>>>> endpoints and credentials are provided. > >>> Flink can > >>>>>>>> get it > >>>>>>>>>>> thru > >>>>>>>>>>>>>> catalogs, but > >>>>>>>>>>>>>>>>>>> we haven't had many catalogs yet and thus > >>> Flink > >>>>>> just > >>>>>>>> hasn't > >>>>>>>>>>>>> been > >>>>>>>>>>>>>> able to > >>>>>>>>>>>>>>>>>>> leverage that. So the solution should be > >>> building > >>>>>>>> more > >>>>>>>>>>>>> catalogs. > >>>>>>>>>>>>>> Such > >>>>>>>>>>>>>>>>>>> params should be part of a Flink table > >>>>>>>> DDL/definition, and > >>>>>>>>>>>> not > >>>>>>>>>>>>>> overridable > >>>>>>>>>>>>>>>>>>> in any means. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> - type 2: Runtime params, like jdbc > >>> connector's > >>>>>>>> fetch size, > >>>>>>>>>>>>>> elasticsearch > >>>>>>>>>>>>>>>>>>> connector's bulk flush size. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Such params don't affect query results, > >> but > >>>>>> affect > >>>>>>>> how > >>>>>>>>>>>> results > >>>>>>>>>>>>>> are produced > >>>>>>>>>>>>>>>>>>> (eg. fast or slow, aka performance) - > >> they > >>> are > >>>>>>>> essentially > >>>>>>>>>>>>>> execution and > >>>>>>>>>>>>>>>>>>> implementation details. They change often > >>> in > >>>>>>>> exploration or > >>>>>>>>>>>>>> development > >>>>>>>>>>>>>>>>>>> stages, but not quite frequently in > >>> well-defined > >>>>>>>>>>> long-running > >>>>>>>>>>>>>> pipelines. > >>>>>>>>>>>>>>>>>>> They should always have default values > >> and > >>> can be > >>>>>>>> missing > >>>>>>>>>>> in > >>>>>>>>>>>>>> query. They > >>>>>>>>>>>>>>>>>>> can be part of a table DDL/definition, > >> but > >>> should > >>>>>>>> also be > >>>>>>>>>>>>>> replaceable in a > >>>>>>>>>>>>>>>>>>> query - *this is what table "hints" in > >>> FLIP-113 > >>>>>>>> should > >>>>>>>>>>>> cover*. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> - type 3: Semantic params, like kafka > >>> connector's > >>>>>>>> start > >>>>>>>>>>>> offset. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Such params affect query results - the > >>> semantics. > >>>>>>>> They'd > >>>>>>>>>>>> better > >>>>>>>>>>>>>> be as > >>>>>>>>>>>>>>>>>>> filter conditions in WHERE clause that > >> can > >>> be > >>>>>> pushed > >>>>>>>> down. > >>>>>>>>>>>> They > >>>>>>>>>>>>>> change > >>>>>>>>>>>>>>>>>>> almost every time a query starts and have > >>>>>> nothing to > >>>>>>>> do > >>>>>>>>>>> with > >>>>>>>>>>>>>> metadata, thus > >>>>>>>>>>>>>>>>>>> should not be part of table > >>> definition/DDL, nor > >>>>>> be > >>>>>>>>>>> persisted > >>>>>>>>>>>> in > >>>>>>>>>>>>>> catalogs. > >>>>>>>>>>>>>>>>>>> If they will, users should create views > >> to > >>> keep > >>>>>> such > >>>>>>>> params > >>>>>>>>>>>>>> around (note > >>>>>>>>>>>>>>>>>>> this is different from variable > >>> substitution). > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Take Flink-Kafka as an example. Once we > >>> get these > >>>>>>>> params > >>>>>>>>>>>> right, > >>>>>>>>>>>>>> here're the > >>>>>>>>>>>>>>>>>>> steps users need to do to develop and run > >>> a Flink > >>>>>>>> job: > >>>>>>>>>>>>>>>>>>> - configure a Flink > >>> ConfluentSchemaRegistry with > >>>>>> url, > >>>>>>>>>>>> username, > >>>>>>>>>>>>>> and password > >>>>>>>>>>>>>>>>>>> - run "SELECT * FROM mykafka WHERE offset > >>>> 12pm > >>>>>>>> yesterday" > >>>>>>>>>>>>>> (simplified > >>>>>>>>>>>>>>>>>>> timestamp) in SQL CLI, Flink > >> automatically > >>>>>> retrieves > >>>>>>>> all > >>>>>>>>>>>>>> metadata of > >>>>>>>>>>>>>>>>>>> schema, file format, etc and start the > >> job > >>>>>>>>>>>>>>>>>>> - users want to make the job read Kafka > >>> topic > >>>>>>>> faster, so it > >>>>>>>>>>>>> goes > >>>>>>>>>>>>>> as "SELECT > >>>>>>>>>>>>>>>>>>> * FROM mykafka /* faster_read_key=value*/ > >>> WHERE > >>>>>>>> offset > > >>>>>>>>>>> 12pm > >>>>>>>>>>>>>> yesterday" > >>>>>>>>>>>>>>>>>>> - done and satisfied, users submit it to > >>>>>> production > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Regarding "CREATE TABLE t LIKE with > >> (k1=v1, > >>>>>> k2=v2), > >>>>>>>> I think > >>>>>>>>>>>>> it's > >>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>> nice-to-have feature, but not a > >>> strategically > >>>>>>>> critical, > >>>>>>>>>>>>>> long-term solution, > >>>>>>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>>>> 1) It may seem promising at the current > >>> stage to > >>>>>>>> solve the > >>>>>>>>>>>>>>>>>>> too-much-manual-work problem, but that's > >>> only > >>>>>>>> because Flink > >>>>>>>>>>>>>> hasn't > >>>>>>>>>>>>>>>>>>> leveraged catalogs well and handled the 3 > >>> types > >>>>>> of > >>>>>>>> params > >>>>>>>>>>>> above > >>>>>>>>>>>>>> properly. > >>>>>>>>>>>>>>>>>>> Once we get the params types right, the > >>> LIKE > >>>>>> syntax > >>>>>>>> won't > >>>>>>>>>>> be > >>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>> important, and will be just an easier way > >>> to > >>>>>> create > >>>>>>>> tables > >>>>>>>>>>>>>> without retyping > >>>>>>>>>>>>>>>>>>> long fields like username and pwd. > >>>>>>>>>>>>>>>>>>> 2) Note that only some rare type of > >>> catalog can > >>>>>>>> store k-v > >>>>>>>>>>>>>> property pair, so > >>>>>>>>>>>>>>>>>>> table created this way often cannot be > >>>>>> persisted. In > >>>>>>>> the > >>>>>>>>>>>>>> foreseeable > >>>>>>>>>>>>>>>>>>> future, such catalog will only be > >>> HiveCatalog, > >>>>>> and > >>>>>>>> not > >>>>>>>>>>>> everyone > >>>>>>>>>>>>>> has a Hive > >>>>>>>>>>>>>>>>>>> metastore. To be honest, without > >>> persistence, > >>>>>>>> recreating > >>>>>>>>>>>> tables > >>>>>>>>>>>>>> every time > >>>>>>>>>>>>>>>>>>> this way is still a lot of keyboard > >> typing. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>> Bowen > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Tue, Mar 10, 2020 at 8:07 PM Kurt > >> Young > >>> < > >>>>>>>>>>> ykt...@gmail.com > >>>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> If a specific connector want to have > >> such > >>>>>>>> parameter and > >>>>>>>>>>>> read > >>>>>>>>>>>>>> if out of > >>>>>>>>>>>>>>>>>>>> configuration, then that's fine. > >>>>>>>>>>>>>>>>>>>> If we are talking about a configuration > >>> for all > >>>>>>>> kinds of > >>>>>>>>>>>>>> sources, I would > >>>>>>>>>>>>>>>>>>>> be super careful about that. > >>>>>>>>>>>>>>>>>>>> It's true it can solve maybe 80% cases, > >>> but it > >>>>>>>> will also > >>>>>>>>>>>> make > >>>>>>>>>>>>>> the left 20% > >>>>>>>>>>>>>>>>>>>> feels weird. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>> Kurt > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Wed, Mar 11, 2020 at 11:00 AM Jark > >> Wu > >>> < > >>>>>>>>>>> imj...@gmail.com > >>>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Hi Kurt, > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> #3 Regarding to global offset: > >>>>>>>>>>>>>>>>>>>>> I'm not saying to use the global > >>>>>> configuration to > >>>>>>>>>>>> override > >>>>>>>>>>>>>> connector > >>>>>>>>>>>>>>>>>>>>> properties by the planner. > >>>>>>>>>>>>>>>>>>>>> But the connector should take this > >>>>>> configuration > >>>>>>>> and > >>>>>>>>>>>>>> translate into their > >>>>>>>>>>>>>>>>>>>>> client API. > >>>>>>>>>>>>>>>>>>>>> AFAIK, almost all the message queues > >>> support > >>>>>>>> eariliest > >>>>>>>>>>>> and > >>>>>>>>>>>>>> latest and a > >>>>>>>>>>>>>>>>>>>>> timestamp value as start point. > >>>>>>>>>>>>>>>>>>>>> So we can support 3 options for this > >>>>>>>> configuration: > >>>>>>>>>>>>>> "eariliest", "latest" > >>>>>>>>>>>>>>>>>>>>> and a timestamp string value. > >>>>>>>>>>>>>>>>>>>>> Of course, this can't solve 100% > >>> cases, but I > >>>>>>>> guess can > >>>>>>>>>>>>>> sovle 80% or 90% > >>>>>>>>>>>>>>>>>>>>> cases. > >>>>>>>>>>>>>>>>>>>>> And the remaining cases can be > >>> resolved by > >>>>>> LIKE > >>>>>>>> syntax > >>>>>>>>>>>>> which > >>>>>>>>>>>>>> I guess is > >>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>> very common cases. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Wed, 11 Mar 2020 at 10:33, Kurt > >>> Young < > >>>>>>>>>>>> ykt...@gmail.com > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Good to have such lovely > >>> discussions. I > >>>>>> also > >>>>>>>> want to > >>>>>>>>>>>>> share > >>>>>>>>>>>>>> some of my > >>>>>>>>>>>>>>>>>>>>>> opinions. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> #1 Regarding to error handling: I > >>> also > >>>>>> think > >>>>>>>> ignore > >>>>>>>>>>>>>> invalid hints would > >>>>>>>>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>>>> dangerous, maybe > >>>>>>>>>>>>>>>>>>>>>> the simplest solution is just throw > >>> an > >>>>>>>> exception. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> #2 Regarding to property > >>> replacement: I > >>>>>> don't > >>>>>>>> think > >>>>>>>>>>> we > >>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>>>>>> constraint > >>>>>>>>>>>>>>>>>>>>>> ourself to > >>>>>>>>>>>>>>>>>>>>>> the meaning of the word "hint", and > >>>>>> forbidden > >>>>>>>> it > >>>>>>>>>>>>> modifying > >>>>>>>>>>>>>> any > >>>>>>>>>>>>>>>>>>>> properties > >>>>>>>>>>>>>>>>>>>>>> which can effect > >>>>>>>>>>>>>>>>>>>>>> query results. IMO `PROPERTIES` is > >>> one of > >>>>>> the > >>>>>>>> table > >>>>>>>>>>>>> hints, > >>>>>>>>>>>>>> and a > >>>>>>>>>>>>>>>>>>>> powerful > >>>>>>>>>>>>>>>>>>>>>> one. It can > >>>>>>>>>>>>>>>>>>>>>> modify properties located in DDL's > >>> WITH > >>>>>> block. > >>>>>>>> But I > >>>>>>>>>>>> also > >>>>>>>>>>>>>> see the harm > >>>>>>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>> if we make it > >>>>>>>>>>>>>>>>>>>>>> too flexible like change the kafka > >>> topic > >>>>>> name > >>>>>>>> with a > >>>>>>>>>>>>> hint. > >>>>>>>>>>>>>> Such use > >>>>>>>>>>>>>>>>>>>> case > >>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>> not common and > >>>>>>>>>>>>>>>>>>>>>> sounds very dangerous to me. I > >> would > >>>>>> propose > >>>>>>>> we have > >>>>>>>>>>> a > >>>>>>>>>>>>> map > >>>>>>>>>>>>>> of hintable > >>>>>>>>>>>>>>>>>>>>>> properties for each > >>>>>>>>>>>>>>>>>>>>>> connector, and should validate all > >>> passed > >>>>>> in > >>>>>>>>>>> properties > >>>>>>>>>>>>>> are actually > >>>>>>>>>>>>>>>>>>>>>> hintable. And combining with > >>>>>>>>>>>>>>>>>>>>>> #1 error handling, we can throw an > >>>>>> exception > >>>>>>>> once > >>>>>>>>>>>>> received > >>>>>>>>>>>>>> invalid > >>>>>>>>>>>>>>>>>>>>>> property. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> #3 Regarding to global offset: I'm > >>> not sure > >>>>>>>> it's > >>>>>>>>>>>>> feasible. > >>>>>>>>>>>>>> Different > >>>>>>>>>>>>>>>>>>>>>> connectors will have totally > >>>>>>>>>>>>>>>>>>>>>> different properties to represent > >>> offset, > >>>>>> some > >>>>>>>> might > >>>>>>>>>>> be > >>>>>>>>>>>>>> timestamps, > >>>>>>>>>>>>>>>>>>>> some > >>>>>>>>>>>>>>>>>>>>>> might be string literals > >>>>>>>>>>>>>>>>>>>>>> like "earliest", and others might > >> be > >>> just > >>>>>>>> integers. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>> Kurt > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 10, 2020 at 11:46 PM > >>> Jark Wu < > >>>>>>>>>>>>> imj...@gmail.com> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I want to jump in the discussion > >>> about > >>>>>> the > >>>>>>>> "dynamic > >>>>>>>>>>>>>> start offset" > >>>>>>>>>>>>>>>>>>>>>> problem. > >>>>>>>>>>>>>>>>>>>>>>> First of all, I share the same > >>> concern > >>>>>> with > >>>>>>>> Timo > >>>>>>>>>>> and > >>>>>>>>>>>>>> Fabian, that the > >>>>>>>>>>>>>>>>>>>>>>> "start offset" affects the query > >>>>>> semantics, > >>>>>>>> i.e. > >>>>>>>>>>> the > >>>>>>>>>>>>>> query result. > >>>>>>>>>>>>>>>>>>>>>>> But "hints" is just used for > >>> optimization > >>>>>>>> which > >>>>>>>>>>>> should > >>>>>>>>>>>>>> affect the > >>>>>>>>>>>>>>>>>>>>> result? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> I think the "dynamic start > >> offset" > >>> is an > >>>>>> very > >>>>>>>>>>>> important > >>>>>>>>>>>>>> usability > >>>>>>>>>>>>>>>>>>>>> problem > >>>>>>>>>>>>>>>>>>>>>>> which will be faced by many > >>> streaming > >>>>>>>> platforms. > >>>>>>>>>>>>>>>>>>>>>>> I also agree "CREATE TEMPORARY > >>> TABLE Temp > >>>>>>>> (LIKE t) > >>>>>>>>>>>> WITH > >>>>>>>>>>>>>>>>>>>>>>> > >>> ('connector.startup-timestamp-millis' = > >>>>>>>>>>>>>> '1578538374471')" is verbose, > >>>>>>>>>>>>>>>>>>>>>> what > >>>>>>>>>>>>>>>>>>>>>>> if we have 10 tables to join? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> However, what I want to propose > >>> (should > >>>>>> be > >>>>>>>> another > >>>>>>>>>>>>>> thread) is a > >>>>>>>>>>>>>>>>>>>> global > >>>>>>>>>>>>>>>>>>>>>>> configuration to reset start > >>> offsets of > >>>>>> all > >>>>>>>> the > >>>>>>>>>>>> source > >>>>>>>>>>>>>> connectors > >>>>>>>>>>>>>>>>>>>>>>> in the query session, e.g. > >>>>>>>>>>>>> "table.sources.start-offset". > >>>>>>>>>>>>>> This is > >>>>>>>>>>>>>>>>>>>>> possible > >>>>>>>>>>>>>>>>>>>>>>> now because > >>> `TableSourceFactory.Context` > >>>>>> has > >>>>>>>>>>>>>> `getConfiguration` > >>>>>>>>>>>>>>>>>>>>>>> method to get the session > >>> configuration, > >>>>>> and > >>>>>>>> use it > >>>>>>>>>>>> to > >>>>>>>>>>>>>> create an > >>>>>>>>>>>>>>>>>>>>> adapted > >>>>>>>>>>>>>>>>>>>>>>> TableSource. > >>>>>>>>>>>>>>>>>>>>>>> Then we can also expose to SQL > >> CLI > >>> via > >>>>>> SET > >>>>>>>> command, > >>>>>>>>>>>>> e.g. > >>>>>>>>>>>>>> `SET > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>> 'table.sources.start-offset'='earliest';`, > >>>>>>>> which is > >>>>>>>>>>>>>> pretty simple and > >>>>>>>>>>>>>>>>>>>>>>> straightforward. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> This is very similar to KSQL's > >> `SET > >>>>>>>>>>>>>> 'auto.offset.reset'='earliest'` > >>>>>>>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>>>>>> is very helpful IMO. > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Tue, 10 Mar 2020 at 22:29, > >> Timo > >>>>>> Walther < > >>>>>>>>>>>>>> twal...@apache.org> > >>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> compared to the hints, FLIP-110 > >>> is > >>>>>> fully > >>>>>>>>>>> compliant > >>>>>>>>>>>> to > >>>>>>>>>>>>>> the SQL > >>>>>>>>>>>>>>>>>>>>> standard. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> I don't think that `CREATE > >>> TEMPORARY > >>>>>> TABLE > >>>>>>>> Temp > >>>>>>>>>>>> (LIKE > >>>>>>>>>>>>>> t) WITH > >>>>>>>>>>>>>>>>>>>> (k=v)` > >>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>> too verbose or awkward for the > >>> power of > >>>>>>>> basically > >>>>>>>>>>>>>> changing the > >>>>>>>>>>>>>>>>>>>> entire > >>>>>>>>>>>>>>>>>>>>>>>> connector. Usually, this > >>> statement > >>>>>> would > >>>>>>>> just > >>>>>>>>>>>> precede > >>>>>>>>>>>>>> the query in > >>>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>>> multiline file. So it can be > >>> change > >>>>>>>> "in-place" > >>>>>>>>>>> like > >>>>>>>>>>>>>> the hints you > >>>>>>>>>>>>>>>>>>>>>>> proposed. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Many companies have a > >>> well-defined set > >>>>>> of > >>>>>>>> tables > >>>>>>>>>>>> that > >>>>>>>>>>>>>> should be > >>>>>>>>>>>>>>>>>>>> used. > >>>>>>>>>>>>>>>>>>>>>> It > >>>>>>>>>>>>>>>>>>>>>>>> would be dangerous if users can > >>> change > >>>>>> the > >>>>>>>> path > >>>>>>>>>>> or > >>>>>>>>>>>>>> topic in a hint. > >>>>>>>>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>>>>>>> catalog/catalog manager should > >>> be the > >>>>>>>> entity that > >>>>>>>>>>>>>> controls which > >>>>>>>>>>>>>>>>>>>>> tables > >>>>>>>>>>>>>>>>>>>>>>>> exist and how they can be > >>> accessed. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> what’s the problem there if > >> we > >>> user > >>>>>> the > >>>>>>>> table > >>>>>>>>>>>> hints > >>>>>>>>>>>>>> to support > >>>>>>>>>>>>>>>>>>>>>> “start > >>>>>>>>>>>>>>>>>>>>>>>> offset”? > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> IMHO it violates the meaning of > >>> a hint. > >>>>>>>> According > >>>>>>>>>>>> to > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>> dictionary, > >>>>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>>> hint is "a statement that > >>> expresses > >>>>>>>> indirectly > >>>>>>>>>>> what > >>>>>>>>>>>>>> one prefers not > >>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>> say explicitly". But offsets > >> are > >>> a > >>>>>>>> property that > >>>>>>>>>>>> are > >>>>>>>>>>>>>> very explicit. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> If we go with the hint > >> approach, > >>> it > >>>>>> should > >>>>>>>> be > >>>>>>>>>>>>>> expressible in the > >>>>>>>>>>>>>>>>>>>>>>>> TableSourceFactory which > >>> properties are > >>>>>>>> supported > >>>>>>>>>>>> for > >>>>>>>>>>>>>> hinting. Or > >>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>> you > >>>>>>>>>>>>>>>>>>>>>>>> plan to offer those hints in a > >>> separate > >>>>>>>>>>> Map<String, > >>>>>>>>>>>>>> String> that > >>>>>>>>>>>>>>>>>>>>> cannot > >>>>>>>>>>>>>>>>>>>>>>>> overwrite existing properties? > >> I > >>> think > >>>>>>>> this would > >>>>>>>>>>>> be > >>>>>>>>>>>>> a > >>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>>>>>>> story... > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> On 10.03.20 10:34, Danny Chan > >>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Timo ~ > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Personally I would say that > >>> offset > > >>>>>> 0 > >>>>>>>> and > >>>>>>>>>>> start > >>>>>>>>>>>>>> offset = 10 does > >>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>> have the same semantic, so from > >>> the SQL > >>>>>>>> aspect, > >>>>>>>>>>> we > >>>>>>>>>>>>> can > >>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>> implement > >>>>>>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>>>>> “starting offset” hint for > >> query > >>> with > >>>>>> such > >>>>>>>> a > >>>>>>>>>>>> syntax. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> And the CREATE TABLE LIKE > >>> syntax is a > >>>>>>>> DDL which > >>>>>>>>>>>> is > >>>>>>>>>>>>>> just verbose > >>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>> defining such dynamic > >> parameters > >>> even > >>>>>> if > >>>>>>>> it could > >>>>>>>>>>>> do > >>>>>>>>>>>>>> that, shall we > >>>>>>>>>>>>>>>>>>>>>> force > >>>>>>>>>>>>>>>>>>>>>>>> users to define a temporal > >> table > >>> for > >>>>>> each > >>>>>>>> query > >>>>>>>>>>>> with > >>>>>>>>>>>>>> dynamic > >>>>>>>>>>>>>>>>>>>> params, > >>>>>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>>>>> would say it’s an awkward > >>> solution. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> "Hints should give "hints" > >> but > >>> not > >>>>>>>> affect the > >>>>>>>>>>>>> actual > >>>>>>>>>>>>>> produced > >>>>>>>>>>>>>>>>>>>>>> result.” > >>>>>>>>>>>>>>>>>>>>>>>> You mentioned that multiple > >>> times and > >>>>>>>> could we > >>>>>>>>>>>> give a > >>>>>>>>>>>>>> reason, > >>>>>>>>>>>>>>>>>>>> what’s > >>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>> problem there if we user the > >>> table > >>>>>> hints to > >>>>>>>>>>> support > >>>>>>>>>>>>>> “start offset” > >>>>>>>>>>>>>>>>>>>> ? > >>>>>>>>>>>>>>>>>>>>>> From > >>>>>>>>>>>>>>>>>>>>>>>> my side I saw some benefits for > >>> that: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> • It’s very convent to set up > >>> these > >>>>>>>> parameters, > >>>>>>>>>>>> the > >>>>>>>>>>>>>> syntax is > >>>>>>>>>>>>>>>>>>>> very > >>>>>>>>>>>>>>>>>>>>>> much > >>>>>>>>>>>>>>>>>>>>>>>> like the DDL definition > >>>>>>>>>>>>>>>>>>>>>>>>> • It’s scope is very clear, > >>> right on > >>>>>> the > >>>>>>>> table > >>>>>>>>>>> it > >>>>>>>>>>>>>> attathed > >>>>>>>>>>>>>>>>>>>>>>>>> • It does not affect the > >> table > >>>>>> schema, > >>>>>>>> which > >>>>>>>>>>>> means > >>>>>>>>>>>>>> in order to > >>>>>>>>>>>>>>>>>>>>>> specify > >>>>>>>>>>>>>>>>>>>>>>>> the offset, there is no need to > >>> define > >>>>>> an > >>>>>>>> offset > >>>>>>>>>>>>>> column which is > >>>>>>>>>>>>>>>>>>>>> weird > >>>>>>>>>>>>>>>>>>>>>>>> actually, offset should never > >> be > >>> a > >>>>>> column, > >>>>>>>> it’s > >>>>>>>>>>>> more > >>>>>>>>>>>>>> like a > >>>>>>>>>>>>>>>>>>>> metadata > >>>>>>>>>>>>>>>>>>>>>> or a > >>>>>>>>>>>>>>>>>>>>>>>> start option. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> So in total, FLIP-110 uses > >> the > >>> offset > >>>>>>>> more > >>>>>>>>>>> like a > >>>>>>>>>>>>>> Hive partition > >>>>>>>>>>>>>>>>>>>>>> prune, > >>>>>>>>>>>>>>>>>>>>>>>> we can do that if we have an > >>> offset > >>>>>>>> column, but > >>>>>>>>>>>> most > >>>>>>>>>>>>>> of the case we > >>>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>> define that, so there is > >>> actually no > >>>>>>>> conflict or > >>>>>>>>>>>>>> overlap. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月10日 +0800 > >> PM4:28,Timo > >>>>>> Walther < > >>>>>>>>>>>>>> twal...@apache.org>,写道: > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> shouldn't FLIP-110[1] solve > >>> most > >>>>>> of the > >>>>>>>>>>>> problems > >>>>>>>>>>>>>> we have around > >>>>>>>>>>>>>>>>>>>>>>> defining > >>>>>>>>>>>>>>>>>>>>>>>>>> table properties more > >>> dynamically > >>>>>>>> without > >>>>>>>>>>>> manual > >>>>>>>>>>>>>> schema work? > >>>>>>>>>>>>>>>>>>>> Also > >>>>>>>>>>>>>>>>>>>>>>>>>> offset definition is easier > >>> with > >>>>>> such a > >>>>>>>>>>> syntax. > >>>>>>>>>>>>>> They must not be > >>>>>>>>>>>>>>>>>>>>>>> defined > >>>>>>>>>>>>>>>>>>>>>>>>>> in catalog but could be > >>> temporary > >>>>>>>> tables that > >>>>>>>>>>>>>> extend from the > >>>>>>>>>>>>>>>>>>>>>> original > >>>>>>>>>>>>>>>>>>>>>>>>>> table. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> In general, we should aim > >> to > >>> keep > >>>>>> the > >>>>>>>> syntax > >>>>>>>>>>>>>> concise and don't > >>>>>>>>>>>>>>>>>>>>>> provide > >>>>>>>>>>>>>>>>>>>>>>>>>> too many ways of doing the > >>> same > >>>>>> thing. > >>>>>>>> Hints > >>>>>>>>>>>>>> should give "hints" > >>>>>>>>>>>>>>>>>>>>> but > >>>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>>>> affect the actual produced > >>> result. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Some connector properties > >>> might > >>>>>> also > >>>>>>>> change > >>>>>>>>>>> the > >>>>>>>>>>>>>> plan or schema > >>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>> future. E.g. they might > >> also > >>> define > >>>>>>>> whether a > >>>>>>>>>>>>>> table source > >>>>>>>>>>>>>>>>>>>>> supports > >>>>>>>>>>>>>>>>>>>>>>>>>> certain push-downs (e.g. > >>> predicate > >>>>>>>>>>> push-down). > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Dawid is currently working > >> a > >>> draft > >>>>>>>> that might > >>>>>>>>>>>>>> makes it possible > >>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>> expose a Kafka offset via > >> the > >>>>>> schema > >>>>>>>> such > >>>>>>>>>>> that > >>>>>>>>>>>>>> `SELECT * FROM > >>>>>>>>>>>>>>>>>>>>> Topic > >>>>>>>>>>>>>>>>>>>>>>>>>> WHERE offset > 10` would > >>> become > >>>>>>>> possible and > >>>>>>>>>>>>> could > >>>>>>>>>>>>>> be pushed > >>>>>>>>>>>>>>>>>>>> down. > >>>>>>>>>>>>>>>>>>>>>> But > >>>>>>>>>>>>>>>>>>>>>>>>>> this is of course, not > >>> planned > >>>>>>>> initially. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >>>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> On 10.03.20 08:34, Danny > >> Chan > >>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Wenlong ~ > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> For PROPERTIES Hint Error > >>>>>> handling > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Actually we have no way > >> to > >>>>>> figure out > >>>>>>>>>>>> whether a > >>>>>>>>>>>>>> error prone > >>>>>>>>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>>>>>>>> is a > >>>>>>>>>>>>>>>>>>>>>>>> PROPERTIES hint, for example, > >> if > >>> use > >>>>>>>> writes a > >>>>>>>>>>> hint > >>>>>>>>>>>>> like > >>>>>>>>>>>>>>>>>>>> ‘PROPERTIAS’, > >>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>>>> not know if this hint is a > >>> PROPERTIES > >>>>>>>> hint, what > >>>>>>>>>>> we > >>>>>>>>>>>>>> know is that > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>>>>>>>>>> name was not registered in our > >>> Flink. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> If the user writes the > >>> hint name > >>>>>>>> correctly > >>>>>>>>>>>>> (i.e. > >>>>>>>>>>>>>> PROPERTIES), > >>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>> did > >>>>>>>>>>>>>>>>>>>>>>>> can enforce the validation of > >>> the hint > >>>>>>>> options > >>>>>>>>>>>> though > >>>>>>>>>>>>>> the pluggable > >>>>>>>>>>>>>>>>>>>>>>>> HintOptionChecker. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> For PROPERTIES Hint > >> Option > >>> Format > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> For a key value style > >> hint > >>>>>> option, > >>>>>>>> the key > >>>>>>>>>>>> can > >>>>>>>>>>>>>> be either a > >>>>>>>>>>>>>>>>>>>> simple > >>>>>>>>>>>>>>>>>>>>>>>> identifier or a string literal, > >>> which > >>>>>>>> means that > >>>>>>>>>>>> it’s > >>>>>>>>>>>>>> compatible > >>>>>>>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>>>>> our > >>>>>>>>>>>>>>>>>>>>>>>> DDL syntax. We support simple > >>>>>> identifier > >>>>>>>> because > >>>>>>>>>>>> many > >>>>>>>>>>>>>> other hints > >>>>>>>>>>>>>>>>>>>> do > >>>>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>>>> have the component complex keys > >>> like > >>>>>> the > >>>>>>>> table > >>>>>>>>>>>>>> properties, and we > >>>>>>>>>>>>>>>>>>>>> want > >>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>> unify the parse block. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月10日 +0800 > >>>>>>>> PM3:19,wenlong.lwl < > >>>>>>>>>>>>>> wenlong88....@gmail.com > >>>>>>>>>>>>>>>>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, thanks for > >> the > >>>>>> proposal. > >>>>>>>> +1 for > >>>>>>>>>>>>>> adding table hints, > >>>>>>>>>>>>>>>>>>>> it > >>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>> really > >>>>>>>>>>>>>>>>>>>>>>>>>>>> a necessary feature for > >>> flink > >>>>>> sql > >>>>>>>> to > >>>>>>>>>>>>> integrate > >>>>>>>>>>>>>> with a catalog. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> For error handling, I > >>> think it > >>>>>>>> would be > >>>>>>>>>>>> more > >>>>>>>>>>>>>> natural to throw > >>>>>>>>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>>>>>>>>>>>>>> exception when error > >>> table hint > >>>>>>>> provided, > >>>>>>>>>>>>>> because the > >>>>>>>>>>>>>>>>>>>> properties > >>>>>>>>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>>>>>>>>>>>>>> will be merged and used > >>> to find > >>>>>>>> the table > >>>>>>>>>>>>>> factory which would > >>>>>>>>>>>>>>>>>>>>>> cause > >>>>>>>>>>>>>>>>>>>>>>> an > >>>>>>>>>>>>>>>>>>>>>>>>>>>> exception when error > >>> properties > >>>>>>>> provided, > >>>>>>>>>>>>>> right? On the other > >>>>>>>>>>>>>>>>>>>>>> hand, > >>>>>>>>>>>>>>>>>>>>>>>> unlike > >>>>>>>>>>>>>>>>>>>>>>>>>>>> other hints which just > >>> affect > >>>>>> the > >>>>>>>> way to > >>>>>>>>>>>>>> execute the query, > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>> property > >>>>>>>>>>>>>>>>>>>>>>>>>>>> table hint actually > >>> affects the > >>>>>>>> result of > >>>>>>>>>>>> the > >>>>>>>>>>>>>> query, we should > >>>>>>>>>>>>>>>>>>>>>> never > >>>>>>>>>>>>>>>>>>>>>>>> ignore > >>>>>>>>>>>>>>>>>>>>>>>>>>>> the given property > >> hints. > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> For the format of > >>> property > >>>>>> hints, > >>>>>>>>>>>> currently, > >>>>>>>>>>>>>> in sql client, we > >>>>>>>>>>>>>>>>>>>>>>> accept > >>>>>>>>>>>>>>>>>>>>>>>>>>>> properties in format of > >>> string > >>>>>>>> only in > >>>>>>>>>>> DDL: > >>>>>>>>>>>>>>>>>>>>>>> 'connector.type'='kafka', > >>>>>>>>>>>>>>>>>>>>>>>> I > >>>>>>>>>>>>>>>>>>>>>>>>>>>> think the format of > >>> properties > >>>>>> in > >>>>>>>> hint > >>>>>>>>>>>> should > >>>>>>>>>>>>>> be the same as > >>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>> format we > >>>>>>>>>>>>>>>>>>>>>>>>>>>> defined in ddl. What do > >>> you > >>>>>> think? > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Bests, > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Wenlong Lyu > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, 10 Mar 2020 at > >>> 14:22, > >>>>>>>> Danny Chan > >>>>>>>>>>> < > >>>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To Weike: About the > >>> Error > >>>>>> Handing > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To be consistent with > >>> other > >>>>>> SQL > >>>>>>>>>>> vendors, > >>>>>>>>>>>>> the > >>>>>>>>>>>>>> default is to > >>>>>>>>>>>>>>>>>>>> log > >>>>>>>>>>>>>>>>>>>>>>>> warnings > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> and if there is any > >>> error > >>>>>>>> (invalid hint > >>>>>>>>>>>>> name > >>>>>>>>>>>>>> or options), the > >>>>>>>>>>>>>>>>>>>>>> hint > >>>>>>>>>>>>>>>>>>>>>>>> is just > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignored. I have > >> already > >>>>>>>> addressed in > >>>>>>>>>>> the > >>>>>>>>>>>>>> wiki. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To Timo: About the > >>> PROPERTIES > >>>>>>>> Table > >>>>>>>>>>> Hint > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> • The properties > >> hints > >>> is > >>>>>> also > >>>>>>>>>>> optional, > >>>>>>>>>>>>>> user can pass in an > >>>>>>>>>>>>>>>>>>>>>> option > >>>>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> override the table > >>> properties > >>>>>>>> but this > >>>>>>>>>>>> does > >>>>>>>>>>>>>> not mean it is > >>>>>>>>>>>>>>>>>>>>>>> required. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> • They should not > >>> include > >>>>>>>> semantics: > >>>>>>>>>>> does > >>>>>>>>>>>>>> the properties > >>>>>>>>>>>>>>>>>>>> belong > >>>>>>>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantic ? I don't > >>> think so, > >>>>>> the > >>>>>>>> plan > >>>>>>>>>>>> does > >>>>>>>>>>>>>> not change right ? > >>>>>>>>>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>>>>>>>>>> result > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> set may be affected, > >>> but > >>>>>> there > >>>>>>>> are > >>>>>>>>>>>> already > >>>>>>>>>>>>>> some hints do so, > >>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>> example, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> MS-SQL MAXRECURSION > >> and > >>>>>> SNAPSHOT > >>>>>>>> hint > >>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> • `SELECT * FROM > >> t(k=v, > >>>>>> k=v)`: > >>>>>>>> this > >>>>>>>>>>>> grammar > >>>>>>>>>>>>>> breaks the SQL > >>>>>>>>>>>>>>>>>>>>>> standard > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> compared to the hints > >>>>>> way(which > >>>>>>>> is > >>>>>>>>>>>> included > >>>>>>>>>>>>>> in comments) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> • I actually didn't > >>> found any > >>>>>>>> vendors > >>>>>>>>>>> to > >>>>>>>>>>>>>> support such > >>>>>>>>>>>>>>>>>>>> grammar, > >>>>>>>>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>>>>>>> there > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> is no way to override > >>> table > >>>>>> level > >>>>>>>>>>>>> properties > >>>>>>>>>>>>>> dynamically. For > >>>>>>>>>>>>>>>>>>>>>>> normal > >>>>>>>>>>>>>>>>>>>>>>>> RDBMS, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> I think there are no > >>> requests > >>>>>>>> for such > >>>>>>>>>>>>>> dynamic parameters > >>>>>>>>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>>>>>>>>> all the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table have the same > >>> storage > >>>>>> and > >>>>>>>>>>>> computation > >>>>>>>>>>>>>> and they are > >>>>>>>>>>>>>>>>>>>> almost > >>>>>>>>>>>>>>>>>>>>>> all > >>>>>>>>>>>>>>>>>>>>>>>> batch > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> • While Flink as a > >>>>>> computation > >>>>>>>> engine > >>>>>>>>>>> has > >>>>>>>>>>>>>> many connectors, > >>>>>>>>>>>>>>>>>>>>>>>> especially for > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> some message queue > >> like > >>>>>> Kafka, > >>>>>>>> we would > >>>>>>>>>>>>> have > >>>>>>>>>>>>>> a start_offset > >>>>>>>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> different each time > >> we > >>> start > >>>>>> the > >>>>>>>> query, > >>>>>>>>>>>>> such > >>>>>>>>>>>>>> parameters can > >>>>>>>>>>>>>>>>>>>> not > >>>>>>>>>>>>>>>>>>>>>> be > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> persisted to catalog, > >>> because > >>>>>>>> it’s not > >>>>>>>>>>>>>> static, this is > >>>>>>>>>>>>>>>>>>>> actually > >>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> background we propose > >>> the > >>>>>> table > >>>>>>>> hints > >>>>>>>>>>> to > >>>>>>>>>>>>>> indicate such > >>>>>>>>>>>>>>>>>>>>> properties > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamically. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> To Jark and Jinsong: > >> I > >>> have > >>>>>>>> removed the > >>>>>>>>>>>>>> query hints part and > >>>>>>>>>>>>>>>>>>>>>> change > >>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> title. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >>>>>> > >>> > >> > https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15 > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月9日 +0800 > >>> PM5:46,Timo > >>>>>>>> Walther < > >>>>>>>>>>>>>> twal...@apache.org > >>>>>>>>>>>>>>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for the > >>> proposal. I > >>>>>>>> agree with > >>>>>>>>>>>>> Jark > >>>>>>>>>>>>>> and Jingsong. > >>>>>>>>>>>>>>>>>>>>> Planner > >>>>>>>>>>>>>>>>>>>>>>>> hints > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and table hints are > >>>>>> orthogonal > >>>>>>>> topics > >>>>>>>>>>>>> that > >>>>>>>>>>>>>> should be > >>>>>>>>>>>>>>>>>>>> discussed > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> separately. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I share Jingsong's > >>> opinion > >>>>>>>> that we > >>>>>>>>>>>> should > >>>>>>>>>>>>>> not use planner > >>>>>>>>>>>>>>>>>>>>> hints > >>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> passing connector > >>>>>> properties. > >>>>>>>> Planner > >>>>>>>>>>>>>> hints should be > >>>>>>>>>>>>>>>>>>>> optional > >>>>>>>>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>>>>> any > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time. They should > >> not > >>>>>> include > >>>>>>>>>>> semantics > >>>>>>>>>>>>>> but only affect > >>>>>>>>>>>>>>>>>>>>>> execution > >>>>>>>>>>>>>>>>>>>>>>>> time. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Connector > >> properties > >>> are an > >>>>>>>> important > >>>>>>>>>>>>> part > >>>>>>>>>>>>>> of the query > >>>>>>>>>>>>>>>>>>>>> itself. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Have you thought > >>> about > >>>>>> options > >>>>>>>> such > >>>>>>>>>>> as > >>>>>>>>>>>>>> `SELECT * FROM t(k=v, > >>>>>>>>>>>>>>>>>>>>>>> k=v)`? > >>>>>>>>>>>>>>>>>>>>>>>> How > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are other vendors > >>> deal with > >>>>>>>> this > >>>>>>>>>>>> problem? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 09.03.20 10:37, > >>>>>> Jingsong Li > >>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, +1 for > >>> table > >>>>>> hints, > >>>>>>>>>>> thanks > >>>>>>>>>>>>> for > >>>>>>>>>>>>>> driving. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I took a look to > >>> FLIP, > >>>>>> most > >>>>>>>> of > >>>>>>>>>>>> content > >>>>>>>>>>>>>> are talking about > >>>>>>>>>>>>>>>>>>>>> query > >>>>>>>>>>>>>>>>>>>>>>>> hints. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> It is > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hard to > >> discussion > >>> and > >>>>>>>> voting. So > >>>>>>>>>>> +1 > >>>>>>>>>>>> to > >>>>>>>>>>>>>> split it as Jark > >>>>>>>>>>>>>>>>>>>>> said. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Another thing is > >>>>>>>> configuration that > >>>>>>>>>>>>>> suitable to config with > >>>>>>>>>>>>>>>>>>>>>> table > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "connector.path" > >>> and > >>>>>>>>>>>> "connector.topic", > >>>>>>>>>>>>>> Are they really > >>>>>>>>>>>>>>>>>>>>>> suitable > >>>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints? Looks > >> weird > >>> to me. > >>>>>>>> Because I > >>>>>>>>>>>>>> think these properties > >>>>>>>>>>>>>>>>>>>>> are > >>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> core of > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> table. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jingsong Lee > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 9, > >>> 2020 at > >>>>>> 5:30 > >>>>>>>> PM Jark > >>>>>>>>>>>> Wu > >>>>>>>>>>>>> < > >>>>>>>>>>>>>> imj...@gmail.com> > >>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Danny > >> for > >>>>>> starting > >>>>>>>> the > >>>>>>>>>>>>>> discussion. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> +1 for this > >>> feature. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we just > >> focus > >>> on the > >>>>>>>> table > >>>>>>>>>>> hints > >>>>>>>>>>>>>> not the query hints in > >>>>>>>>>>>>>>>>>>>>>> this > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> release, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> could you split > >>> the > >>>>>> FLIP > >>>>>>>> into two > >>>>>>>>>>>>>> FLIPs? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Because it's > >>> hard to > >>>>>> vote > >>>>>>>> on > >>>>>>>>>>>> partial > >>>>>>>>>>>>>> part of a FLIP. You > >>>>>>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>>>>> keep > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the table > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints proposal > >> in > >>>>>> FLIP-113 > >>>>>>>> and > >>>>>>>>>>> move > >>>>>>>>>>>>>> query hints into > >>>>>>>>>>>>>>>>>>>> another > >>>>>>>>>>>>>>>>>>>>>>> FLIP. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So that we can > >>> focuse > >>>>>> on > >>>>>>>> the > >>>>>>>>>>> table > >>>>>>>>>>>>>> hints in the FLIP. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, 9 Mar > >>> 2020 at > >>>>>>>> 17:14, > >>>>>>>>>>> DONG, > >>>>>>>>>>>>>> Weike < > >>>>>>>>>>>>>>>>>>>>>>> kyled...@connect.hku.hk > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danny, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is a > >> nice > >>>>>> feature, > >>>>>>>> +1. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> One thing I > >> am > >>>>>>>> interested in > >>>>>>>>>>> but > >>>>>>>>>>>>> not > >>>>>>>>>>>>>> mentioned in the > >>>>>>>>>>>>>>>>>>>>>> proposal > >>>>>>>>>>>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> error > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> handling, as > >>> it is > >>>>>> quite > >>>>>>>> common > >>>>>>>>>>>> for > >>>>>>>>>>>>>> users to write > >>>>>>>>>>>>>>>>>>>>>>> inappropriate > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints in > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL code, if > >>> illegal > >>>>>> or > >>>>>>>> "bad" > >>>>>>>>>>>> hints > >>>>>>>>>>>>>> are given, would the > >>>>>>>>>>>>>>>>>>>>>> system > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> simply > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ignore them > >> or > >>> throw > >>>>>>>>>>> exceptions? > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks : ) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Weike > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Mar > >> 9, > >>> 2020 > >>>>>> at > >>>>>>>> 5:02 PM > >>>>>>>>>>>>> Danny > >>>>>>>>>>>>>> Chan < > >>>>>>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Note: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we only > >> plan > >>> to > >>>>>>>> support table > >>>>>>>>>>>>>> hints in Flink release > >>>>>>>>>>>>>>>>>>>> 1.11, > >>>>>>>>>>>>>>>>>>>>>> so > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> please > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> focus > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mainly on > >>> the table > >>>>>>>> hints > >>>>>>>>>>> part > >>>>>>>>>>>>> and > >>>>>>>>>>>>>> just ignore the > >>>>>>>>>>>>>>>>>>>> planner > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> hints, sorry > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that > >> mistake > >>> ~ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny Chan > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 在 2020年3月9日 > >>> +0800 > >>>>>>>>>>> PM4:36,Danny > >>>>>>>>>>>>>> Chan < > >>>>>>>>>>>>>>>>>>>> yuzhao....@gmail.com > >>>>>>>>>>>>>>>>>>>>>>>> ,写道: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, > >>> fellows ~ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would > >>> like to > >>>>>>>> propose the > >>>>>>>>>>>>>> supports for SQL hints for > >>>>>>>>>>>>>>>>>>>>> our > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink SQL. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We would > >>> support > >>>>>>>> hints > >>>>>>>>>>> syntax > >>>>>>>>>>>>> as > >>>>>>>>>>>>>> following: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> select > >> /*+ > >>>>>>>> NO_HASH_JOIN, > >>>>>>>>>>>>>> RESOURCE(mem='128mb', > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallelism='24') */ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> emp /*+ > >>>>>> INDEX(idx1, > >>>>>>>> idx2) > >>>>>>>>>>> */ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> join > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dept /*+ > >>>>>>>>>>> PROPERTIES(k1='v1', > >>>>>>>>>>>>>> k2='v2') */ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >> emp.deptno > >>> = > >>>>>>>> dept.deptno > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Basically > >>> we > >>>>>> would > >>>>>>>> support > >>>>>>>>>>>> both > >>>>>>>>>>>>>> query hints(after the > >>>>>>>>>>>>>>>>>>>>>> SELECT > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> keyword) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and table > >>>>>> hints(after > >>>>>>>> the > >>>>>>>>>>>>>> referenced table name), for > >>>>>>>>>>>>>>>>>>>>> 1.11, > >>>>>>>>>>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> plan to > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> support > >>> table hints > >>>>>>>> with a > >>>>>>>>>>> hint > >>>>>>>>>>>>>> probably named > >>>>>>>>>>>>>>>>>>>> PROPERTIES: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >> table_name > >>> /*+ > >>>>>>>>>>>>>> PROPERTIES(k1='v1', k2='v2') *+/ > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am > >>> looking > >>>>>> forward > >>>>>>>> to > >>>>>>>>>>> your > >>>>>>>>>>>>>> comments. > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You can > >>> access > >>>>>> the > >>>>>>>> FLIP > >>>>>>>>>>> here: > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >>>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danny > >> Chan > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >> > > > >