Hi Timo, Thanks for the effort and writing up this document. I like the idea to make flink-table scala free, so +1 for the proposal!
It's good to make Java the first-class citizen. For a long time, we have neglected java so that many features in Table are missed in Java Test cases, such as this one[1] I found recently. And I think we may also need to migrate our test cases, i.e, add java tests. This definitely is a big change and will break API compatible. In order to bring a smaller impact on users, I think we should go fast when we migrate APIs targeted to users. It's better to introduce the user sensitive changes within a release. However, it may be not that easy. I can help to contribute. Separation of interface and implementation is a good idea. This may introduce a minimum of dependencies or even no dependencies. I saw your reply in the google doc. Java8 has already supported static method for interfaces, I think we can make use of it? Best, Hequn [1] https://issues.apache.org/jira/browse/FLINK-11001 On Fri, Nov 23, 2018 at 5:36 PM Timo Walther <twal...@apache.org> wrote: > Hi everyone, > > thanks for the great feedback so far. I updated the document with the > input I got so far > > @Fabian: I moved the porting of flink-table-runtime classes up in the list. > > @Xiaowei: Could you elaborate what "interface only" means to you? Do you > mean a module containing pure Java `interface`s? Or is the validation > logic also part of the API module? Are 50+ expression classes part of > the API interface or already too implementation-specific? > > @Xuefu: I extended the document by almost a page to clarify when we > should develop in Scala and when in Java. As Piotr said, every new Scala > line is instant technical debt. > > Thanks, > Timo > > > Am 23.11.18 um 10:29 schrieb Piotr Nowojski: > > Hi Timo, > > > > Thanks for writing this down +1 from my side :) > > > >> I'm wondering that whether we can have rule in the interim when Java > and Scala coexist that dependency can only be one-way. I found that in the > current code base there are cases where a Scala class extends Java and vise > versa. This is quite painful. I'm thinking if we could say that extension > can only be from Java to Scala, which will help the situation. However, I'm > not sure if this is practical. > > Xuefu: I’m also not sure what’s the best approach here, probably we will > have to work it out as we go. One thing to consider is that from now on, > every single new code line written in Scala anywhere in Flink-table (except > of Flink-table-api-scala) is an instant technological debt. From this > perspective I would be in favour of tolerating quite big inchonvieneces > just to avoid any new Scala code. > > > > Piotrek > > > >> On 23 Nov 2018, at 03:25, Zhang, Xuefu <xuef...@alibaba-inc.com> wrote: > >> > >> Hi Timo, > >> > >> Thanks for the effort and the Google writeup. During our external > catalog rework, we found much confusion between Java and Scala, and this > Scala-free roadmap should greatly mitigate that. > >> > >> I'm wondering that whether we can have rule in the interim when Java > and Scala coexist that dependency can only be one-way. I found that in the > current code base there are cases where a Scala class extends Java and vise > versa. This is quite painful. I'm thinking if we could say that extension > can only be from Java to Scala, which will help the situation. However, I'm > not sure if this is practical. > >> > >> Thanks, > >> Xuefu > >> > >> > >> ------------------------------------------------------------------ > >> Sender:jincheng sun <sunjincheng...@gmail.com> > >> Sent at:2018 Nov 23 (Fri) 09:49 > >> Recipient:dev <dev@flink.apache.org> > >> Subject:Re: [DISCUSS] Long-term goal of making flink-table Scala-free > >> > >> Hi Timo, > >> Thanks for initiating this great discussion. > >> > >> Currently when using SQL/TableAPI should include many dependence. In > >> particular, it is not necessary to introduce the specific implementation > >> dependencies which users do not care about. So I am glad to see your > >> proposal, and hope when we consider splitting the API interface into a > >> separate module, so that the user can introduce minimum of dependencies. > >> > >> So, +1 to [separation of interface and implementation; e.g. `Table` & > >> `TableImpl`] which you mentioned in the google doc. > >> Best, > >> Jincheng > >> > >> Xiaowei Jiang <xiaow...@gmail.com> 于2018年11月22日周四 下午10:50写道: > >> > >>> Hi Timo, thanks for driving this! I think that this is a nice thing to > do. > >>> While we are doing this, can we also keep in mind that we want to > >>> eventually have a TableAPI interface only module which users can take > >>> dependency on, but without including any implementation details? > >>> > >>> Xiaowei > >>> > >>> On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <fhue...@gmail.com> > wrote: > >>> > >>>> Hi Timo, > >>>> > >>>> Thanks for writing up this document. > >>>> I like the new structure and agree to prioritize the porting of the > >>>> flink-table-common classes. > >>>> Since flink-table-runtime is (or should be) independent of the API and > >>>> planner modules, we could start porting these classes once the code is > >>>> split into the new module structure. > >>>> The benefits of a Scala-free flink-table-runtime would be a Scala-free > >>>> execution Jar. > >>>> > >>>> Best, Fabian > >>>> > >>>> > >>>> Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb Timo Walther < > >>>> twal...@apache.org > >>>>> : > >>>>> Hi everyone, > >>>>> > >>>>> I would like to continue this discussion thread and convert the > outcome > >>>>> into a FLIP such that users and contributors know what to expect in > the > >>>>> upcoming releases. > >>>>> > >>>>> I created a design document [1] that clarifies our motivation why we > >>>>> want to do this, how a Maven module structure could look like, and a > >>>>> suggestion for a migration plan. > >>>>> > >>>>> It would be great to start with the efforts for the 1.8 release such > >>>>> that new features can be developed in Java and major refactorings > such > >>>>> as improvements to the connectors and external catalog support are > not > >>>>> blocked. > >>>>> > >>>>> Please let me know what you think. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> [1] > >>>>> > >>>>> > >>> > https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing > >>>>> > >>>>> Am 02.07.18 um 17:08 schrieb Fabian Hueske: > >>>>>> Hi Piotr, > >>>>>> > >>>>>> thanks for bumping this thread and thanks for Xingcan for the > >>> comments. > >>>>>> I think the first step would be to separate the flink-table module > >>> into > >>>>>> multiple sub modules. These could be: > >>>>>> > >>>>>> - flink-table-api: All API facing classes. Can be later divided > >>> further > >>>>>> into Java/Scala Table API/SQL > >>>>>> - flink-table-planning: involves all planning (basically everything > >>> we > >>>> do > >>>>>> with Calcite) > >>>>>> - flink-table-runtime: the runtime code > >>>>>> > >>>>>> IMO, a realistic mid-term goal is to have the runtime module and > >>>> certain > >>>>>> parts of the planning module ported to Java. > >>>>>> The api module will be much harder to port because of several > >>>>> dependencies > >>>>>> to Scala core classes (the parser framework, tree iterations, etc.). > >>>> I'm > >>>>>> not saying we should not port this to Java, but it is not clear to > me > >>>>> (yet) > >>>>>> how to do it. > >>>>>> > >>>>>> I think flink-table-runtime should not be too hard to port. The code > >>>> does > >>>>>> not make use of many Scala features, i.e., it's writing very > >>> Java-like. > >>>>>> Also, there are not many dependencies and operators can be > >>> individually > >>>>>> ported step-by-step. > >>>>>> For flink-table-planning, we can have certain packages that we port > >>> to > >>>>> Java > >>>>>> like planning rules or plan nodes. The related classes mostly extend > >>>>>> Calcite's Java interfaces/classes and would be natural choices for > >>>> being > >>>>>> ported. The code generation classes will require more effort to > port. > >>>>> There > >>>>>> are also some dependencies in planning on the api module that we > >>> would > >>>>> need > >>>>>> to resolve somehow. > >>>>>> > >>>>>> For SQL most work when adding new features is done in the planning > >>> and > >>>>>> runtime modules. So, this separation should already reduce > >>>> "technological > >>>>>> dept" quite a lot. > >>>>>> The Table API depends much more on Scala than SQL. > >>>>>> > >>>>>> Cheers, Fabian > >>>>>> > >>>>>> > >>>>>> > >>>>>> 2018-07-02 16:26 GMT+02:00 Xingcan Cui <xingc...@gmail.com>: > >>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I also think about this problem these days and here are my > thoughts. > >>>>>>> > >>>>>>> 1) We must admit that it’s really a tough task to interoperate with > >>>> Java > >>>>>>> and Scala. E.g., they have different collection types (Scala > >>>> collections > >>>>>>> v.s. java.util.*) and in Java, it's hard to implement a method > which > >>>>> takes > >>>>>>> Scala functions as parameters. Considering the major part of the > >>> code > >>>>> base > >>>>>>> is implemented in Java, +1 for this goal from a long-term view. > >>>>>>> > >>>>>>> 2) The ideal solution would be to just expose a Scala API and make > >>> all > >>>>> the > >>>>>>> other parts Scala-free. But I am not sure if it could be achieved > >>> even > >>>>> in a > >>>>>>> long-term. Thus as Timo suggested, keep the Scala codes in > >>>>>>> "flink-table-core" would be a compromise solution. > >>>>>>> > >>>>>>> 3) If the community makes the final decision, maybe any new > features > >>>>>>> should be added in Java (regardless of the modules), in order to > >>>> prevent > >>>>>>> the Scala codes from growing. > >>>>>>> > >>>>>>> Best, > >>>>>>> Xingcan > >>>>>>> > >>>>>>> > >>>>>>>> On Jul 2, 2018, at 9:30 PM, Piotr Nowojski < > >>> pi...@data-artisans.com> > >>>>>>> wrote: > >>>>>>>> Bumping the topic. > >>>>>>>> > >>>>>>>> If we want to do this, the sooner we decide, the less code we will > >>>> have > >>>>>>> to rewrite. I have some objections/counter proposals to Fabian's > >>>>> proposal > >>>>>>> of doing it module wise and one module at a time. > >>>>>>>> First, I do not see a problem of having java/scala code even > within > >>>> one > >>>>>>> module, especially not if there are clean boundaries. Like we could > >>>> have > >>>>>>> API in Scala and optimizer rules/logical nodes written in Java in > >>> the > >>>>> same > >>>>>>> module. However I haven’t previously maintained mixed scala/java > >>> code > >>>>> bases > >>>>>>> before, so I might be missing something here. > >>>>>>>> Secondly this whole migration might and most like will take longer > >>>> then > >>>>>>> expected, so that creates a problem for a new code that we will be > >>>>>>> creating. After making a decision to migrate to Java, almost any > new > >>>>> Scala > >>>>>>> line of code will be immediately a technological debt and we will > >>> have > >>>>> to > >>>>>>> rewrite it to Java later. > >>>>>>>> Thus I would propose first to state our end goal - modules > >>> structure > >>>>> and > >>>>>>> which parts of modules we want to have eventually Scala-free. > >>> Secondly > >>>>>>> taking all steps necessary that will allow us to write new code > >>>>> complaint > >>>>>>> with our end goal. Only after that we should/could focus on > >>>>> incrementally > >>>>>>> rewriting the old code. Otherwise we could be stuck/blocked for > >>> years > >>>>>>> writing new code in Scala (and increasing technological debt), > >>> because > >>>>>>> nobody have found a time to rewrite some non important and not > >>>> actively > >>>>>>> developed part of some module. > >>>>>>>> Piotrek > >>>>>>>> > >>>>>>>>> On 14 Jun 2018, at 15:34, Fabian Hueske <fhue...@gmail.com> > >>> wrote: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> In general, I think this is a good effort. However, it won't be > >>> easy > >>>>>>> and I > >>>>>>>>> think we have to plan this well. > >>>>>>>>> I don't like the idea of having the whole code base fragmented > >>> into > >>>>> Java > >>>>>>>>> and Scala code for too long. > >>>>>>>>> > >>>>>>>>> I think we should do this one step at a time and focus on > >>> migrating > >>>>> one > >>>>>>>>> module at a time. > >>>>>>>>> IMO, the easiest start would be to port the runtime to Java. > >>>>>>>>> Extracting the API classes into an own module, porting them to > >>> Java, > >>>>> and > >>>>>>>>> removing the Scala dependency won't be possible without breaking > >>> the > >>>>> API > >>>>>>>>> since a few classes depend on the Scala Table API. > >>>>>>>>> > >>>>>>>>> Best, Fabian > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> 2018-06-14 10:33 GMT+02:00 Till Rohrmann <trohrm...@apache.org>: > >>>>>>>>> > >>>>>>>>>> I think that is a noble and honorable goal and we should strive > >>> for > >>>>> it. > >>>>>>>>>> This, however, must be an iterative process given the sheer size > >>> of > >>>>> the > >>>>>>>>>> code base. I like the approach to define common Java modules > >>> which > >>>>> are > >>>>>>> used > >>>>>>>>>> by more specific Scala modules and slowly moving classes from > >>> Scala > >>>>> to > >>>>>>>>>> Java. Thus +1 for the proposal. > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> Till > >>>>>>>>>> > >>>>>>>>>> On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski < > >>>>>>> pi...@data-artisans.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi, > >>>>>>>>>>> > >>>>>>>>>>> I do not have an experience with how scala and java interacts > >>> with > >>>>>>> each > >>>>>>>>>>> other, so I can not fully validate your proposal, but generally > >>>>>>> speaking > >>>>>>>>>> +1 > >>>>>>>>>>> from me. > >>>>>>>>>>> > >>>>>>>>>>> Does it also mean, that we should slowly migrate > >>>> `flink-table-core` > >>>>> to > >>>>>>>>>>> Java? How would you envision it? It would be nice to be able to > >>>> add > >>>>>>> new > >>>>>>>>>>> classes/features written in Java and so that they can coexist > >>> with > >>>>> old > >>>>>>>>>>> Scala code until we gradually switch from Scala to Java. > >>>>>>>>>>> > >>>>>>>>>>> Piotrek > >>>>>>>>>>> > >>>>>>>>>>>> On 13 Jun 2018, at 11:32, Timo Walther <twal...@apache.org> > >>>> wrote: > >>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>> > >>>>>>>>>>>> as you all know, currently the Table & SQL API is implemented > >>> in > >>>>>>> Scala. > >>>>>>>>>>> This decision was made a long-time ago when the initital code > >>> base > >>>>> was > >>>>>>>>>>> created as part of a master's thesis. The community kept Scala > >>>>>>> because of > >>>>>>>>>>> the nice language features that enable a fluent Table API like > >>>>>>>>>>> table.select('field.trim()) and because Scala allows for quick > >>>>>>>>>> prototyping > >>>>>>>>>>> (e.g. multi-line comments for code generation). The committers > >>>>>>> enforced > >>>>>>>>>> not > >>>>>>>>>>> splitting the code-base into two programming languages. > >>>>>>>>>>>> However, nowadays the flink-table module more and more becomes > >>> an > >>>>>>>>>>> important part in the Flink ecosystem. Connectors, formats, and > >>>> SQL > >>>>>>>>>> client > >>>>>>>>>>> are actually implemented in Java but need to interoperate with > >>>>>>>>>> flink-table > >>>>>>>>>>> which makes these modules dependent on Scala. As mentioned in > an > >>>>>>> earlier > >>>>>>>>>>> mail thread, using Scala for API classes also exposes member > >>>>> variables > >>>>>>>>>> and > >>>>>>>>>>> methods in Java that should not be exposed to users [1]. Java > is > >>>>> still > >>>>>>>>>> the > >>>>>>>>>>> most important API language and right now we treat it as a > >>>>>>> second-class > >>>>>>>>>>> citizen. I just noticed that you even need to add Scala if you > >>>> just > >>>>>>> want > >>>>>>>>>> to > >>>>>>>>>>> implement a ScalarFunction because of method clashes between > >>>> `public > >>>>>>>>>> String > >>>>>>>>>>> toString()` and `public scala.Predef.String toString()`. > >>>>>>>>>>>> Given the size of the current code base, reimplementing the > >>>> entire > >>>>>>>>>>> flink-table code in Java is a goal that we might never reach. > >>>>>>> However, we > >>>>>>>>>>> should at least treat the symptoms and have this as a long-term > >>>> goal > >>>>>>> in > >>>>>>>>>>> mind. My suggestion would be to convert user-facing and runtime > >>>>>>> classes > >>>>>>>>>> and > >>>>>>>>>>> split the code base into multiple modules: > >>>>>>>>>>>>> flink-table-java {depends on flink-table-core} > >>>>>>>>>>>> Implemented in Java. Java users can use this. This would > >>> require > >>>> to > >>>>>>>>>>> convert classes like TableEnvironment, Table. > >>>>>>>>>>>>> flink-table-scala {depends on flink-table-core} > >>>>>>>>>>>> Implemented in Scala. Scala users can use this. > >>>>>>>>>>>> > >>>>>>>>>>>>> flink-table-common > >>>>>>>>>>>> Implemented in Java. Connectors, formats, and UDFs can use > >>> this. > >>>> It > >>>>>>>>>>> contains interface classes such as descriptors, table sink, > >>> table > >>>>>>> source. > >>>>>>>>>>>>> flink-table-core {depends on flink-table-common and > >>>>>>>>>>> flink-table-runtime} > >>>>>>>>>>>> Implemented in Scala. Contains the current main code base. > >>>>>>>>>>>> > >>>>>>>>>>>>> flink-table-runtime > >>>>>>>>>>>> Implemented in Java. This would require to convert classes in > >>>>>>>>>>> o.a.f.table.runtime but would improve the runtime potentially. > >>>>>>>>>>>> What do you think? > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> > >>>>>>>>>>>> Timo > >>>>>>>>>>>> > >>>>>>>>>>>> [1] > >>>>>>>>>>> http://apache-flink-mailing-list-archive.1008284.n3. > >>>>>>>>>> nabble.com/DISCUSS-Convert-main-Table-API-classes-into- > >>>>>>> traits-tp21335.html > >>>>> > >