Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Xuefu Z Fri, 04 Oct 2019 17:42:13 -0700

I agree with Timo that the new table APIs need to be consistent. I'd go
further that an name (or id) is needed for module definition in YAML file.
In the current design, name is skipped and type has binary meanings.


Thanks,
Xuefu

On Fri, Oct 4, 2019 at 5:24 AM Timo Walther <twal...@apache.org> wrote:

> Hi everyone,
>
> first, I was also questioning my proposal. But Bowen's proposal of
> `tEnv.offloadToYaml(<file_path>)` would not work with the current design
> because we don't know how to serialize a catalog or module into
> properties. Currently, there is no converter from instance to
> properties. It is a one way conversion. We can add a `toProperties`
> method to both Catalog and Module class in the future to solve this.
> Solving the table environment serializability can be future work.
>
> However, I find the current proposal for the TableEnvironment methods is
> contradicting:
>
> tableEnv.loadModule(new Yyy());
> tableEnv.unloadModule(“Xxx”);
>
> The loading is specified programmatically whereas the unloading requires
> a string that is not specified in the module itself. But is defined in
> the factory according to the current design.
>
> SQL does it more consistently. There, the name `xxx` is used when
> loading and unloading the module:
>
> LOAD MODULE 'xxx' [WITH ('prop'='myProp', ...)]
> UNLOAD MODULE 'xxx’
>
> How about:
>
> tableEnv.loadModule("xxx", new Yyy());
> tableEnv.unloadModule(“xxx”);
>
> This would be similar to the catalog interfaces. The name is not part of
> the instance itself.
>
> What do you think?
>
> Thanks,
> Timo
>
>
>
>
> On 01.10.19 21:17, Bowen Li wrote:
> > If something like the yaml file is the way to go and achieve such
> > motivation, we would cover that with current design.
> >
> > On Tue, Oct 1, 2019 at 12:05 Bowen Li <bowenl...@gmail.com> wrote:
> >
> >> Hi Timo, Dawid,
> >>
> >> I've added the suggested SQL and related changes to TableEnvironment API
> >> and other classes to the google doc. Also removed "USE MODULE" and its
> >> APIs. Will update FLIP wiki once we have a consensus.
> >>
> >> W.r.t. descriptor approach, my gut feeling is similar to Dawid's.
> Besides,
> >> I feel yaml file would be a better solution to persist serializable
> state
> >> of an environment as the file itself is in serializable format already.
> >> Though yaml file only serves SQL CLI at this moment, we may be able to
> >> extend its reach to Table API and allow users to load/offload a
> >> TableEnvironment from/to yaml files, as something like "TableEnvironment
> >> tEnv = TableEnvironment.loadFromYaml(<file_path>)" and
> >> "tEnv.offloadToYaml(<file_path>)" to restore and persist state, and try
> to
> >> make yaml file more expressive.
> >>
> >>
> >> On Tue, Oct 1, 2019 at 6:47 AM Dawid Wysakowicz <dwysakow...@apache.org
> >
> >> wrote:
> >>
> >>> Hi Timo, Bowen,
> >>>
> >>> Unfortunately I did not have enough time to go through all the
> >>> suggestions in details so I can not comment on the whole FLIP.
> >>>
> >>> I just wanted to give my opinion on the "descriptor approach in
> >>> loadModule" part. I am not sure if we need it here. We might be
> >>> overthinking this a bit. It definitely makes sense for objects like
> >>> TableSource/TableSink etc. as they are logical definitions that nearly
> >>> always have to be persisted in a Catalog. I'm not sure if we really
> need
> >>> the same for a whole session. If we need a resume session feature, the
> >>> way to go would probably be to keep the session in memory on the server
> >>> side. I fear we will never be able to serialize the whole session
> >>> entirely (temporary objects, objects derived from DataStream etc.)
> >>>
> >>> I think it is ok to use instances for objects like Catalogs or Modules
> >>> and have an overlay on top of that that can create instances from
> >>> properties.
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>>
> >>> On 01/10/2019 11:28, Timo Walther wrote:
> >>>> Hi Bowen,
> >>>>
> >>>> thanks for your response.
> >>>>
> >>>> Re 2) I also don't have a better approach for this issue. It is
> >>>> similar to changing the general TableConfig between two statements. It
> >>>> would be good to add your explanation to the design document.
> >>>>
> >>>> Re 3) It would be interesting to know about which "core" functions we
> >>>> are actually talking about. Also for the overriding built-in functions
> >>>> that we discussed in the other FLIP. But I'm fine with leaving it to
> >>>> the user for now. How about we just introduce loadModule(),
> >>>> unloadModule() methods instead of useModules()? This would ensure that
> >>>> users don't forget to add the core module when adding an additional
> >>>> module and they need to explicitly call "unloadModule('core')".
> >>>>
> >>>> Re 4) Every table environment feature should also be designed with SQL
> >>>> statements in mind to verify the concept. SQL is also more popular
> >>>> that Java/Scala API or YAML file. I would like to add it to 1.10 for
> >>>> marking the feature as complete.
> >>>>
> >>>> SHOW MODULES -> sounds good to me, we should add a listModules():
> >>>> List<String> method to table environment
> >>>>
> >>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] --> we should add a
> >>>> loadModule() method to table environment
> >>>>
> >>>> UNLOAD MODULE 'hive' --> we should add a unloadModule() method to
> >>>> table environment
> >>>>
> >>>> I would not introduce `USE MODULES 'x' 'y' 'z'` for simplicity and
> >>>> concise API. Users need to load the module anyway with properties.
> >>>> They can also load them "in order" immediately. CREATE TABLE can also
> >>>> not create multiple tables but only one at a time in that order.
> >>>>
> >>>> One thing that came to my mind, shall we use a descriptor approach for
> >>>> loadModule()? The past has shown that passing instances causes
> >>>> problems when persisting objects. That's why we also want to get rid
> >>>> of registerTableSource. I could image that users might want to persist
> >>>> a table environment's state for later use in the future. Even though
> >>>> this is future work, we should already keep such use cases in mind
> >>>> when adding new API methods. What do you think?
> >>>>
> >>>> Regards,
> >>>> Timo
> >>>>
> >>>>
> >>>> On 30.09.19 23:17, Bowen Li wrote:
> >>>>> Hi Timo,
> >>>>>
> >>>>> Re 1) I agree. I renamed the title to "Extend Core Table System with
> >>>>> Pluggable Modules" and all internal references
> >>>>>
> >>>>> Re 2) First, I'll rename the API to useModules(). The design doesn't
> >>>>> forbid
> >>>>> users to call useModules() multi times. Objects in modules are loaded
> >>> on
> >>>>> demand instead of eagerly, so there won't be inconsistency. Users
> >>>>> have to
> >>>>> be fully aware of the consequences of resetting modules as that might
> >>>>> cause
> >>>>> that some objects can not be referenced anymore or resolution order
> >>>>> of some
> >>>>> objects changes.
> >>>>>
> >>>>> Re 3) Yes, we'd leave that to users.
> >>>>>
> >>>>> Another approach can be to have a non-optional "Core" module for all
> >>>>> objects that cannot be overrode like "CAST" and "AS" functions, and
> >>>>> have an
> >>>>> optional "ExtendedCore" module for other replaceable built-in
> objects.
> >>>>> "Core" should be positioned at the 1st in module list all the time.
> >>>>>
> >>>>> I'm fine with either solution.
> >>>>>
> >>>>> Re 4) It may sound like a nice-to-have advanced feature for 1.10, but
> >>> we
> >>>>> can surely fully discuss it for the sake of feature completeness.
> >>>>>
> >>>>> Unlike other configs, the order of modules would matter in Flink, and
> >>> it
> >>>>> implies the LOAD/UNLOAD commands would not be equal in operation
> >>>>> positions.
> >>>>> IIUYC, LOAD MODULE 'x' would be interpreted as appending x to the end
> >>> of
> >>>>> module list, and UNLOAD MODULE 'x' would be interpreted as removing x
> >>>>> from
> >>>>> any position in the list?
> >>>>>
> >>>>> I'm thinking of the following list of commands:
> >>>>>
> >>>>> SHOW MODULES - list modules in order
> >>>>> LOAD MODULE 'hive' [WITH ('prop'='myProp', ...)] - load and append
> the
> >>>>> module to end of the module list
> >>>>> UNLOAD MODULE 'hive' - remove the module from module list, and other
> >>>>> modules remain the same relative positions
> >>>>> USE MODULES 'x' 'y' 'z' (wondering can parser take "'x' 'y' 'z'"?),
> >>>>> or USE
> >>>>> MODULES 'x,y,z' - to reorder module list completely
> >>>>>
> >>>
>
>

-- 
Xuefu Zhang

"In Honey We Trust!"

Re: [DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins

Reply via email to