I think it is still worthwhile to keep Avatica and Calcite as 2 separate repositories. I remembered joining this list a few years back then Avatica was moved into its own repository and my perception of the components were: - Avatica - HTTP server that you can put in front of your database so that you can easily write or generate a client in any language to talk to the database.
- Calcite - SQL engine used to build the SQL part of your database.

From this perspective, it does not seem that the 2 components should be so tightly coupled and be in the same repository. Each should be able to stand up on its own and have its own independent release schedule.

Having said that, I've seen PRs that need to touch both repositories for a change and it seems like a code smell to me. Perhaps we should look at why those 2 seemly separate components are tightly coupled together and what we can do to remove the coupling. Maybe this means moving some code from Avatica to Calcite or vice-versa to severe the coupling so that both components are truly independent and a change won't need to touch both components.

On 9/11/2021 12:10 pm, xiong duan wrote:
I have no opinion on merging the two repositories or don't.
But If we don't merge it. About CALCITE-4877, Maybe we can:
1) we cancel test Calcite's master branch with Avatica's master branch
directly. (It is unreasonable)

When have an ISSUE need to submit PR in both, we can try:
1) If the PR is good. We can merge the PR in AVATICA at any time(Because it
is a new version).
2) But The PR for Calcite, *needs to include the AVATICA new's version*(not
released but can ensure run CI jobs in this PR to make sure the PR is good).
3)Until the Calcite prepares to upgrade the dependency Avatica version,
Then we can merge it. (As the last version, We create a Jira to log which
PR needs to merge).

Julian Hyde <[email protected]> 于2021年11月9日周二 上午8:43写道:

There are many reasons to divide code into modules. To allow separate
communities, to allow separate release schedules, to reduce coupling, to
make it easier to contribute (because contributors don’t need to understand
a large code base), to increase adoption (because people perceive that they
can use component A without using component B).

The last of these was particularly on my mind when we split Avatica from
Calcite. I was pleased to see, for example, Apache Phoenix using Avatica
successfully (including building an ODBC driver) even though their
(separate) attempt to adopt Calcite failed.

Splitting code into modules makes it easier to continue to splitting. If
Avatica had remained part of Calcite, both written in Java and using the
same build system and release process, it’s less likely that Avatica-go
would have happened.

I think the split between Avatica and Calcite repositories is working
great.

Julian


On Nov 8, 2021, at 3:56 PM, Josh Elser <[email protected]> wrote:

These repositories are separate because they have separate release
schedules. Those who were working on calcite.git typically do not want to
be bothered with changes to calcite-avatica.git, and vice versa. Further,
there are downstream users of Avatica directly (without Calcite) who would
be burdened by waiting for a new Calcite release as opposed to a (much more
simple) Avatica release.

Recently there was a PR that improves error messages in Avatica:

If I am interpreting your analysis correctly, you are arguing for
wanting a shorter cycle to "implementing" all parts of a change (both in
Avatica and Calcite). The net amount of code you have to write doesn't
change with how it works now compared to how you are suggesting it should
work.

If there are breaking changes in Avatica, they would need to be
accounted for when Calcite is bumped to the next version of Avatica.

In terms of wire compatibility, there has been no breaking wire compat
change that wasn't due to a problem with the protocol itself (where we had
to break it). As far as API compatibility goes, I do not believe we have
ever _had_ to break API compatibility. The untracked runtime changes (like
the one you cite) are a bigger smell to me in Calcite but that's a
different conversation to be had.

* Avatica has fewer commits than Calcite, so having a separate
calcite-avatica repository does not help for segregating
PR/issue/commit
queue

I disagree with your assessment. Having a separate repository makes it
extremely easy for me to watch Avatica commits/pull-requests which I am
capable of reviewing vs. Calcite pull-requests which I am not comfortable
reviewing.

* calcite-avatica-go seems to reside in its own repository, so I do
not see
why do we split Java implementations across calcite and calcite-avatica
repository

I have no objections to combining these two repositories together.

It seems to me that combining these repositories is one possible
solution to address the friction of making changes across these two
repositories, rather than starting from the root problem "How can we make
co-dependent changes easier?"

I would challenge us instead of complicating releases (which are already
complicated): what should the API/compatibility surface of Avatica be, and
what how can be make the gaps you've experienced better? Such a fix will go
a long way for downstream users of Avatica, too.

- Josh

On 11/8/21 2:39 PM, Vladimir Sitnikov wrote:
Hi,
Currently, we have calcite-avatica and calcite in different
repositories.
Frankly speaking, I do not know what it brings, however, it does create
points of friction:
1) If a feature touches Avatica and Calcite, then PRs are hard to create
and maintain. We just can't create a single PR across both repositories
2) If we support a single Avatica version only in Calcite, then the
point
of having different repositories is even mooter.
3) CI configuration is basically duplicated: every time we want to add a
new JDK (once every 6 times), we have to do it twice
4) There are common dependencies: JUnit, hamcrest, etc, etc. We
basically
have to do the same thing multiple times when upgrading versions in
avatica
and calcite
5) Adding @Nullable annotations to Calcite was more complicated than I
wanted because Avatica is stored in a different repository.
I basically had to create a bunch of astub files instead of just putting
the relevant @nullable annotations on top of Avatica classes:

https://github.com/apache/calcite/tree/f1db79fb876ac9ba3c405283e99bb0438e4e97be/src/main/config/checkerframework/avatica
Recently there was a PR that improves error messages in Avatica:
https://github.com/apache/calcite-avatica/pull/161
I am sure the PR is a great improvement, however, it fails CI in both
cases:
a) Current Avatica fails when it runs integration tests against Calcite
(because Calcite expects old, low-detail exception messages)
b) Current Calcite fails to build with "latest Avatica" because, well,
Avatica produces "too good" exception messages
It surfaces a true problem: we have too tight code integration between
"different" systems, and it probably makes sense to have both libraries
in
a single repository.
An alternative option is to make sure Calcite "supports" at least two
Avatica versions: "previous version + one new".
However, the current tests in Calcite expect a specific error message,
so
it can't support two alternative messges.
Well, the tests are in .iq format which could probably support multiple
messages, however, I have absolutely no idea how to implement that.
Facts so far:
* Avatica has fewer commits than Calcite, so having a separate
calcite-avatica repository does not help for segregating PR/issue/commit
queue
* Calcite seems to support one specific Avatica version only, so it
makes
sense to just keep them in a single repository
* calcite-avatica-go seems to reside in its own repository, so I do not
see
why do we split Java implementations across calcite and calcite-avatica
repository
* There is non-trivial maintenance overhead (see 1..5 above). Frankly
speaking, I was trying my best to **avoid** maintaining calcite-avatica.
Somebody wanted to go into a separate repository, so, I let them do what
they want there.
However, there are cases when I have to spend extra time because
calcite-avatica is a separate repository (PR161, @Nullable are the
recent
samples)
* It looks like I broke the build by merging PR#161. That is why I am
trying to roll the thing forward and bring this discussion.
An alternative option is I revert the merge and wait for somebody else
to
pick up the task.
So my questions are:
Q1) Does having calcite-avatica as a separate repository do anybody any
good?
Q2) Does anybody object to merging calcite-avatica and calcite into a
single calcite repository?
Vladimir



Reply via email to