Hi, everyone ~

Allows me to share some thoughts here.

Personally i think for SQL, "format" is obviously better than "format.name", it 
is more concise and straight-forward, similar with Presto FORMAT[2] and KSQL 
VALUE_FORMAT[1]; i think we move from "connector.type" to "connector" for the 
same reason, the "type" or "name" suffix is implicit, SQL syntax like the DDL 
is a top-level user API, so from my side keeping good user-friendly syntax is 
more important.

@Timo I'm big +1 for the a good code style guide, but that does not mean we 
should go for a json-style table options in the DDL, the DDL could have its own 
contract. Can we move "represent these config options in YAML" to another topic 
? Otherwise, how should we handle the "connector" key, should we prefix all the 
table options with "connector" ? The original inention of FLIP-122 is to remove 
some redundant prefix/suffix of the table options because they are obviously 
implicit there, and the "connector." prefix and the ".type" or ".name" suffix 
are the ones we most want to delete.

@Dawid Although ".type" is just another 4 characters, but we force the SQL 
users to do the thing that is obvious reduadant, i know serialize catalog table 
to YAML or use the options in DataStream has similar keys request, but they are 
different use cases that i believe many SQL user would not encounter, that 
means we force many users to obey rules for cases they would never have.


[1] https://docs.ksqldb.io/en/latest/developer-guide/create-a-table/
[2] https://prestodb.io/docs/current/sql/create-table.html

Best,
Danny Chan
在 2020年5月4日 +0800 PM11:34,Till Rohrmann <trohrm...@apache.org>,写道:
> Hi everyone,
>
> I like Timo's proposal to organize our configuration more hierarchical
> since this is what the coding guide specifies. The benefit I see is that
> config options belonging to the same concept will be found in the same
> nested object. Moreover, it will be possible to split the configuration
> into unrelated parts which are fed to the respective components. That way
> one has a much better separation of concern since component A cannot read
> the configuration of component B.
>
> Concerning Timo's last two proposals:
>
> If fail-on-missing-field is a common configuration shared by all formats,
> then I would go with the first option:
>
> format.kind: json
> format.fail-on-missing-field: true
>
> If fail-on-missing-field is specific for json, then one could go with
>
> format: json
> json.fail-on-missing-field: true
>
> or
>
> format.kind: json
> format.json.fail-on-missing-field: true
>
> Cheers,
> Till
>
>
> On Fri, May 1, 2020 at 11:55 AM Timo Walther <twal...@apache.org> wrote:
>
> > Hi Jark,
> >
> > yes, in theory every connector can design options as they like. But for
> > user experience and good coding style we should be consistent in Flink
> > connectors and configuration. Because implementers of new connectors
> > will copy the design of existing ones.
> >
> > Furthermore, I could image that people in the DataStream API would also
> > like to configure their connector based on options in the near future.
> > It might be the case that Flink DataStream API connectors will reuse the
> > ConfigOptions from Table API for consistency.
> >
> > I'm favoring either:
> >
> > format.kind = json
> > format.fail-on-missing-field: true
> >
> > Or:
> >
> > format = json
> > json.fail-on-missing-field: true
> >
> > Both are valid hierarchies.
> >
> > Regards,
> > Timo
> >
> >
> > On 30.04.20 17:57, Jark Wu wrote:
> > > Hi Dawid,
> > >
> > > I just want to mention one of your response,
> > >
> > > > What you described with
> > > > 'format' = 'csv',
> > > > 'csv.allow-comments' = 'true',
> > > > 'csv.ignore-parse-errors' = 'true'
> > > > would not work though as the `format` prefix is mandatory in the sources
> > > as only the properties with format
> > > > will be passed to the format factory in majority of cases. We already
> > > have some implicit contracts.
> > >
> > > IIUC, in FLIP-95 and FLIP-122, the property key style are totally decided
> > > by connectors, not the framework.
> > > So I custom connector can define above properties, and extract the value
> > of
> > > 'format', i.e. 'csv', to find the format factory.
> > > And extract the properties with `csv.` prefix and remove the prefix, and
> > > pass the properties (e.g. 'allow-comments' = 'true')
> > > into the format factory to create format.
> > >
> > > So there is no a strict guarantee to have a "nested JSON style"
> > properties.
> > > Users can still develop a custom connector with this
> > > un-hierarchy properties and works well.
> > >
> > > 'format' = 'json',
> > > 'format.fail-on-missing-field' = 'false'
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz <dwysakow...@apache.org>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to start with a comment that I am ok with the current state of
> > > > the FLIP-122 if there is a strong preference for it. Nevertheless I
> > still
> > > > like the idea of adding `type` to the `format` to have it as
> > `format.type`
> > > > = `json`.
> > > >
> > > > I wanted to clarify a few things though:
> > > >
> > > > @Jingsong As far as I see it most of the users copy/paste the properties
> > > > from the documentation to the SQL, so I don't think additional four
> > > > characters are too cumbersome. Plus if you force the additional suffix
> > onto
> > > > all the options of a format you introduce way more boilerplate than if
> > we
> > > > added the `type/kind/name`
> > > >
> > > > @Kurt I agree that we cannot force it, but I think it is more of a
> > > > question to set standards/implicit contracts on the properties. What you
> > > > described with
> > > > 'format' = 'csv',
> > > > 'csv.allow-comments' = 'true',
> > > > 'csv.ignore-parse-errors' = 'true'
> > > >
> > > > would not work though as the `format` prefix is mandatory in the sources
> > > > as only the properties with format will be passed to the format factory
> > in
> > > > majority of cases. We already have some implicit contracts.
> > > >
> > > > @Forward I did not necessarily get the example. Aren't json and bson two
> > > > separate formats? Do you mean you can have those two at the same time?
> > Why
> > > > do you need to differentiate the options for each? The way I see it is:
> > > >
> > > > ‘format(.name)' = 'json',
> > > > ‘format.fail-on-missing-field' = 'false'
> > > >
> > > > or
> > > >
> > > > ‘format(.name)' = 'bson',
> > > > ‘format.fail-on-missing-field' = 'false'
> > > >
> > > > @Benchao I'd be fine with any of name, kind, type(this we already had in
> > > > the past)
> > > >
> > > > Best,
> > > > Dawid
> > > >
> > > > On 30/04/2020 04:17, Forward Xu wrote:
> > > >
> > > > Here I have a little doubt. At present, our json only supports the
> > > > conventional json format. If we need to implement json with bson, json
> > with
> > > > avro, etc., how should we express it?
> > > > Do you need like the following:
> > > >
> > > > ‘format.name' = 'json',
> > > >
> > > > ‘format.json.fail-on-missing-field' = 'false'
> > > >
> > > >
> > > > ‘format.name' = 'bson',
> > > >
> > > > ‘format.bson.fail-on-missing-field' = ‘false'
> > > >
> > > >
> > > > Best,
> > > >
> > > > Forward
> > > >
> > > > Benchao Li <libenc...@gmail.com> <libenc...@gmail.com> 于2020年4月30日周四
> > 上午9:58写道:
> > > >
> > > >
> > > > Thanks Timo for staring the discussion.
> > > >
> > > > Generally I like the idea to keep the config align with a standard like
> > > > json/yaml.
> > > >
> > > > From the user's perspective, I don't use table configs from a config
> > file
> > > > like yaml or json for now,
> > > > And it's ok to change it to yaml like style. Actually we didn't know
> > that
> > > > this could be a yaml like
> > > > configuration hierarchy. If it has a hierarchy, we maybe consider that
> > in
> > > > the future to load the
> > > > config from a yaml/json file.
> > > >
> > > > Regarding the name,
> > > > 'format.kind' looks fine to me. However there is another name from the
> > top
> > > > of my head:
> > > > 'format.name', WDYT?
> > > >
> > > > Dawid Wysakowicz <dwysakow...@apache.org> <dwysakow...@apache.org>
> > 于2020年4月29日周三 下午11:56写道:
> > > >
> > > >
> > > > Hi all,
> > > >
> > > > I also wanted to share my opinion.
> > > >
> > > > When talking about a ConfigOption hierarchy we use for configuring Flink
> > > > cluster I would be a strong advocate for keeping a yaml/hocon/json/...
> > > > compatible style. Those options are primarily read from a file and thus
> > > > should at least try to follow common practices for nested formats if we
> > > > ever decide to switch to one.
> > > >
> > > > Here the question is about the properties we use in SQL statements. The
> > > > origin/destination of these usually will be external catalog, usually in
> > > >
> > > > a
> > > >
> > > > flattened(key/value) representation so I agree it is not as important as
> > > >
> > > > in
> > > >
> > > > the aforementioned case. Nevertheless having a yaml based catalog or
> > > >
> > > > being
> > > >
> > > > able to have e.g. yaml based snapshots of a catalog in my opinion is
> > > > appealing. At the same time cost of being able to have a nice
> > > > yaml/hocon/json representation is just adding a single suffix to a
> > > > single(at most 2 key + value) property. The question is between `format`
> > > >
> > > > =
> > > >
> > > > `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
> > > > doing it.
> > > >
> > > > Just to have a full picture. Both cases can be represented in yaml, but
> > > > the difference is significant:
> > > > format: 'json'
> > > > format.option: 'value'
> > > >
> > > > vs
> > > > format:
> > > > kind: 'json'
> > > >
> > > > option: 'value'
> > > >
> > > > Best,
> > > > Dawid
> > > >
> > > > On 29/04/2020 17:13, Flavio Pompermaier wrote:
> > > >
> > > > Personally I don't have any preference here. Compliance wih standard
> > > >
> > > > YAML
> > > >
> > > > parser is probably more important
> > > >
> > > > On Wed, Apr 29, 2020 at 5:10 PM Jark Wu <imj...@gmail.com> <
> > imj...@gmail.com> wrote:
> > > >
> > > >
> > > > From a user's perspective, I prefer the shorter one "format=json",
> > > >
> > > > because
> > > >
> > > > it's more concise and straightforward. The "kind" is redundant for
> > > >
> > > > users.
> > > >
> > > > Is there a real case requires to represent the configuration in JSON
> > > > style?
> > > > As far as I can see, I don't see such requirement, and everything works
> > > > fine by now.
> > > >
> > > > So I'm in favor of "format=json". But if the community insist to follow
> > > > code style on this, I'm also fine with the longer one.
> > > >
> > > > Btw, I also CC user mailing list to listen more user's feedback.
> > > >
> > > > Because I
> > > >
> > > > think this is relative to usability.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler <ches...@apache.org> <
> > ches...@apache.org>
> > > > wrote:
> > > >
> > > >
> > > > > Therefore, should we advocate instead:
> > > > >
> > > > > 'format.kind' = 'json',
> > > > > 'format.fail-on-missing-field' = 'false'
> > > >
> > > > Yes. That's pretty much it.
> > > >
> > > > This is reasonable important to nail down as with such violations I
> > > > believe we could not actually switch to a standard YAML parser.
> > > >
> > > > On 29/04/2020 16:05, Timo Walther wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > discussions around ConfigOption seem to be very popular recently.
> > > >
> > > > So I
> > > >
> > > > would also like to get some opinions on a different topic.
> > > >
> > > > How do we represent hierarchies in ConfigOption? In FLIP-122, we
> > > > agreed on the following DDL syntax:
> > > >
> > > > CREATE TABLE fs_table (
> > > > ...
> > > > ) WITH (
> > > > 'connector' = 'filesystem',
> > > > 'path' = 'file:///path/to/whatever',
> > > > 'format' = 'csv',
> > > > 'format.allow-comments' = 'true',
> > > > 'format.ignore-parse-errors' = 'true'
> > > > );
> > > >
> > > > Of course this is slightly different from regular Flink core
> > > > configuration but a connector still needs to be configured based on
> > > > these options.
> > > >
> > > > However, I think this FLIP violates our code style guidelines
> > > >
> > > > because
> > > >
> > > > 'format' = 'json',
> > > > 'format.fail-on-missing-field' = 'false'
> > > >
> > > > is an invalid hierarchy. `format` cannot be a string and a top-level
> > > > object at the same time.
> > > >
> > > > We have similar problems in our runtime configuration:
> > > >
> > > > state.backend=
> > > > state.backend.incremental=
> > > > restart-strategy=
> > > > restart-strategy.fixed-delay.delay=
> > > > high-availability=
> > > > high-availability.cluster-id=
> > > >
> > > > The code style guide states "Think of the configuration as nested
> > > > objects (JSON style)". So such hierarchies cannot be represented in
> > > >
> > > > a
> > > >
> > > > nested JSON style.
> > > >
> > > > Therefore, should we advocate instead:
> > > >
> > > > 'format.kind' = 'json',
> > > > 'format.fail-on-missing-field' = 'false'
> > > >
> > > > What do you think?
> > > >
> > > > Thanks,
> > > > Timo
> > > >
> > > > [1]
> > > >
> > > >
> > > >
> > https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> > > >
> > > > --
> > > >
> > > > Benchao Li
> > > > School of Electronics Engineering and Computer Science, Peking
> > UniversityTel:+86-15650713730
> > > > Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > > >
> > > >
> > >
> >
> >

Reply via email to