Hi colleagues,

Thank you for the valuable feedback. The problem is indeed complex. I share
the worry that complete decoupling might be too disruptive for users, since
they will observe compilation problems when migrating to the newer version,
and will have to update their dependencies, which also could be problematic
(e.g. due to security concerns). So I'd like to propose a slightly
different approach that should not cause any problems for the existing
users. We change the goal from the complete decoupling to the *isolation *of
dependent classes.

Let me explain it with Avatica as an example. There are two class of
Avatica-related dependencies in the core: (1) utilities (e.g. classes from
org.apache.calcite.avatica.util), and (2) logic (e.g. classes from
org.apache.calcite.jdbc, org.apache.calcite.adapter.jdbc). The first class
is very easy to eliminate. The second class cannot be eliminated with the
serious repackaging of the whole Calcite. So we can do the following:

1. Introduce the "commons" module, and move utilities there, thus solving
(1).
2. Shade the "commons" module into the "core" during the build - if we do
this, the existing users will not have to change their dependencies, so
this is a critically important step (at least for now). An alternative to
this is just to copy-paste utility classes into the "core" module,
violating DRY
3. Contain the outstanding Avatica dependencies to a couple of JDBC-related
packages, and add a static analysis rule to disallow Avatica classes in any
other package. This may require some advanced refactoring (e.g.
CalciteConnectionConfig)

As a result, Avatica dependency is reduced to a handful of packages, and
existing applications will work mostly seamlessly during migration. Now we
can do one of two things:
1. Either create a separate reduced artifact "core-reduced" without
Avatica-dependent packages
2. Since many products shade Calcite during the build, we can advise them
to just exclude Avatica-dependent packages when shading

How does it sound?

Regards,
Vladimir


ср, 25 нояб. 2020 г. в 10:48, Chunwei Lei <[email protected]>:

> I like the idea. But I have the same worry as Haisheng.
>
>
> Best,
> Chunwei
>
>
> On Wed, Nov 25, 2020 at 3:07 PM Xin Wang <[email protected]> wrote:
>
> > +1 for this idea. We only use the parser/optimizer part.
> >
> > JiaTao Tao <[email protected]> 于2020年11月25日周三 下午2:38写道:
> >
> > > +1 for this idea, I have been developing Calcite for a long
> time(counting
> > > during project Kylin), we all treat calcite as an optimizer, but we
> need
> > to
> > > consider overhead.
> > >
> > > I aggre with Stamatis: "since those dependencies were not causing any
> > real
> > > trouble."
> > >
> > >
> > > What really troubling me is that when we do some in logical, we may
> have
> > to
> > > consider the implemnt, for an example, we used keep "In", not convert
> to
> > > join or "OR", but calcite have no impl about "In".
> > >
> > >
> > > Regards!
> > >
> > > Aron Tao
> > >
> > >
> > >
> > > Haisheng Yuan <[email protected]> 于2020年11月25日周三 下午12:57写道:
> > >
> > > > > I would like to propose to decouple the "core" module from "ling4j"
> > and
> > > > Avatica.
> > > > I like the idea.
> > > >
> > > > Moving Enumerable out of core may be time consuming and disruptive,
> > > > because many core tests are using Enumerable to verify plan quality
> and
> > > > correctness.
> > > >
> > > > Best,
> > > > Haisheng
> > > >
> > > > On 2020/11/24 23:42:19, Stamatis Zampetakis <[email protected]>
> wrote:
> > > > > Hi Vladimir,
> > > > >
> > > > > Personally, I like the idea.
> > > > > I had similar thoughts in the past but it didn't try to break it
> down
> > > > since
> > > > > those dependencies were not causing any real trouble.
> > > > >
> > > > > Let's see what the others think.
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > > >
> > > > > On Tue, Nov 24, 2020 at 7:30 PM Vladimir Ozerov <
> [email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi colleagues,
> > > > > >
> > > > > > Many Calcite integrations use only part of the framework.
> > > > Specifically, it
> > > > > > is common to use only the parser/optimizer part. JDBC and runtime
> > are
> > > > used
> > > > > > less frequently because they are not very well suited for mature
> > > > processing
> > > > > > engines (e.g. Enumerable runs out of memory easily).
> > > > > >
> > > > > > However, in order to use the parser/optimizer from the core
> module,
> > > you
> > > > > > also need to add "linq4j" and Avatica modules to the classpath,
> > which
> > > > is
> > > > > > not convenient - why to include modules, that you do not use?
> > > > > >
> > > > > > It turns out that most of the dependencies are indeed leaky
> > > > abstractions,
> > > > > > that could be decoupled easily. For example, the RelOptUtil class
> > > from
> > > > the
> > > > > > "core" depends on ... two string constants from the Avatica
> module.
> > > > > >
> > > > > > I would like to propose to decouple the "core" module from
> "ling4j"
> > > and
> > > > > > Avatica. For example, we may introduce the new "common" module,
> > that
> > > > will
> > > > > > hold common constants, utility classes, and interfaces (e.g.
> Meta).
> > > > Then,
> > > > > > we can organize the dependencies like this:
> > > > > > common -> core
> > > > > > common -> linq4j
> > > > > > common -> Avatica
> > > > > >
> > > > > > Finally, we may shade and relocate the "common" module into the
> > > "core"
> > > > > > during the build. In the end, we will have -2 runtime
> dependencies
> > > with
> > > > > > relatively little effort. In principle, the same approach could
> be
> > > > applied
> > > > > > to Janino and Jackson dependencies, but it could be more complex,
> > so
> > > my
> > > > > > proposal is only about "linq4" and Avatica.
> > > > > >
> > > > > > How do you feel about it? Does this proposal sense to the
> > community?
> > > If
> > > > > > yes, I can try implementing the POC for this.
> > > > > >
> > > > > > Regards,
> > > > > > Vladimir.
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Thanks,
> > Xin
> >
>

Reply via email to