There are many reasons to divide code into modules. To allow separate
communities, to allow separate release schedules, to reduce coupling, to
make it easier to contribute (because contributors don’t need to understand
a large code base), to increase adoption (because people perceive that they
can use component A without using component B).
The last of these was particularly on my mind when we split Avatica from
Calcite. I was pleased to see, for example, Apache Phoenix using Avatica
successfully (including building an ODBC driver) even though their
(separate) attempt to adopt Calcite failed.
Splitting code into modules makes it easier to continue to splitting. If
Avatica had remained part of Calcite, both written in Java and using the
same build system and release process, it’s less likely that Avatica-go
would have happened.
I think the split between Avatica and Calcite repositories is working
great.
Julian
On Nov 8, 2021, at 3:56 PM, Josh Elser <[email protected]> wrote:
These repositories are separate because they have separate release
schedules. Those who were working on calcite.git typically do not want to
be bothered with changes to calcite-avatica.git, and vice versa. Further,
there are downstream users of Avatica directly (without Calcite) who would
be burdened by waiting for a new Calcite release as opposed to a (much more
simple) Avatica release.
Recently there was a PR that improves error messages in Avatica:
If I am interpreting your analysis correctly, you are arguing for
wanting a shorter cycle to "implementing" all parts of a change (both in
Avatica and Calcite). The net amount of code you have to write doesn't
change with how it works now compared to how you are suggesting it should
work.
If there are breaking changes in Avatica, they would need to be
accounted for when Calcite is bumped to the next version of Avatica.
In terms of wire compatibility, there has been no breaking wire compat
change that wasn't due to a problem with the protocol itself (where we had
to break it). As far as API compatibility goes, I do not believe we have
ever _had_ to break API compatibility. The untracked runtime changes (like
the one you cite) are a bigger smell to me in Calcite but that's a
different conversation to be had.
* Avatica has fewer commits than Calcite, so having a separate
calcite-avatica repository does not help for segregating
PR/issue/commit
queue
I disagree with your assessment. Having a separate repository makes it
extremely easy for me to watch Avatica commits/pull-requests which I am
capable of reviewing vs. Calcite pull-requests which I am not comfortable
reviewing.
* calcite-avatica-go seems to reside in its own repository, so I do
not see
why do we split Java implementations across calcite and calcite-avatica
repository
I have no objections to combining these two repositories together.
It seems to me that combining these repositories is one possible
solution to address the friction of making changes across these two
repositories, rather than starting from the root problem "How can we make
co-dependent changes easier?"
I would challenge us instead of complicating releases (which are already
complicated): what should the API/compatibility surface of Avatica be, and
what how can be make the gaps you've experienced better? Such a fix will go
a long way for downstream users of Avatica, too.
- Josh
On 11/8/21 2:39 PM, Vladimir Sitnikov wrote:
Hi,
Currently, we have calcite-avatica and calcite in different
repositories.
Frankly speaking, I do not know what it brings, however, it does create
points of friction:
1) If a feature touches Avatica and Calcite, then PRs are hard to create
and maintain. We just can't create a single PR across both repositories
2) If we support a single Avatica version only in Calcite, then the
point
of having different repositories is even mooter.
3) CI configuration is basically duplicated: every time we want to add a
new JDK (once every 6 times), we have to do it twice
4) There are common dependencies: JUnit, hamcrest, etc, etc. We
basically
have to do the same thing multiple times when upgrading versions in
avatica
and calcite
5) Adding @Nullable annotations to Calcite was more complicated than I
wanted because Avatica is stored in a different repository.
I basically had to create a bunch of astub files instead of just putting
the relevant @nullable annotations on top of Avatica classes:
https://github.com/apache/calcite/tree/f1db79fb876ac9ba3c405283e99bb0438e4e97be/src/main/config/checkerframework/avatica
Recently there was a PR that improves error messages in Avatica:
https://github.com/apache/calcite-avatica/pull/161
I am sure the PR is a great improvement, however, it fails CI in both
cases:
a) Current Avatica fails when it runs integration tests against Calcite
(because Calcite expects old, low-detail exception messages)
b) Current Calcite fails to build with "latest Avatica" because, well,
Avatica produces "too good" exception messages
It surfaces a true problem: we have too tight code integration between
"different" systems, and it probably makes sense to have both libraries
in
a single repository.
An alternative option is to make sure Calcite "supports" at least two
Avatica versions: "previous version + one new".
However, the current tests in Calcite expect a specific error message,
so
it can't support two alternative messges.
Well, the tests are in .iq format which could probably support multiple
messages, however, I have absolutely no idea how to implement that.
Facts so far:
* Avatica has fewer commits than Calcite, so having a separate
calcite-avatica repository does not help for segregating PR/issue/commit
queue
* Calcite seems to support one specific Avatica version only, so it
makes
sense to just keep them in a single repository
* calcite-avatica-go seems to reside in its own repository, so I do not
see
why do we split Java implementations across calcite and calcite-avatica
repository
* There is non-trivial maintenance overhead (see 1..5 above). Frankly
speaking, I was trying my best to **avoid** maintaining calcite-avatica.
Somebody wanted to go into a separate repository, so, I let them do what
they want there.
However, there are cases when I have to spend extra time because
calcite-avatica is a separate repository (PR161, @Nullable are the
recent
samples)
* It looks like I broke the build by merging PR#161. That is why I am
trying to roll the thing forward and bring this discussion.
An alternative option is I revert the merge and wait for somebody else
to
pick up the task.
So my questions are:
Q1) Does having calcite-avatica as a separate repository do anybody any
good?
Q2) Does anybody object to merging calcite-avatica and calcite into a
single calcite repository?
Vladimir