Hi all,

The PR for Docker images [1] has received some good feedback so far, but
also, unsurprisingly, some perplexity. The reason why I started this email
thread is precisely because I anticipated that the PR would raise some
eyebrows.

Let me clarify here what I think is maybe the most controversial part of
the PR: how EclipseLink is handled moving forward.

First of all, I would like to stress two simple facts: this is not about
Quarkus vs Dropwizard, and this is not about EclipseLink vs Hibernate or
something else. These are orthogonal to the real issue: how to properly
provide OSS Polaris users with *usable* Docker images and binary
distributions.

The statu quo is: we currently have a single Dockerfile that serves all
kinds of purposes: building Polaris for evaluation and prototyping;
building Polaris for regression tests; and, likely, building Polaris for
production (!!). That Dockerfile has security flaws outlined elsewhere [2].
But most importantly, since EclipseLink is merely a build argument left to
users to deal with, if we keep this setup, we are sending a really
unfortunate message to our users: that we are not serious about providing
them with a usable product.

That's where the PR comes in. In this PR, I am suggesting to *bundle
EclipseLink in the Docker images and distributions, along with two JDBC
drivers: H2 and PostgreSQL*. This is not a technical decision, but a
pragmatic one: most users will want EclipseLink (again: *for now since
there is no other choice*), and most users will want Postgres. By doing as
I suggest, we are providing a much more usable product to our users. This
is much more in line with what other similar projects do, Nessie being a
good example.

Also, including these drivers in the official distro has one important
meaning for users: *that we are committed to supporting them*. This is a
very important message to send to our users. In the current statu quo,
users could fear that issues with EclipseLink + Postgres (or other dbs)
wouldn't receive any support, since there is no database driver other than
H2 in Polaris.

I think more JDBC drivers could be added in the future, but I also think we
should start with the most common one, and then see how it goes.

Should we then end up including 136 JDBC drivers in Polaris distros? No, of
course not. But if there is a strong demand for a specific driver, and its
license is ASF-compatible, then we can consider adding it. OTOH, a user
asking Polaris to support their exotic, niche database should be instructed
instead to build its own Polaris distro – and we should provide guidance on
how to do so.

Are there other options?

We could do nothing. In this case, Polaris OSS distros and Docker images
would by default contain only the in-memory metastore. I personally think
that this is the worst option. Remember: *Quarkus-based distros cannot be
augmented with extra jars at runtime*. IOW, each and every user of OSS
Polaris would have to build their own distro.

We could also create distinct Docker images and distros for each database
we want to officially support. There would be e.g. an
`apache/polaris-postgres` image, an `apache/polaris-mysql` image, etc. But
this wouldn't scale, and would make the release process a nightmare, not to
mention the combined size of all these artifacts, that would need to be
pushed to docker.io (multiplied by the number of platforms we support), to
other download sites, then synced to other registries, etc. And each one of
these images/distros would need to be tested, and maintained. This is not a
good option.

I hope that the (admittedly long) explanation above clarifies the rationale
behind the PR. I am open to suggestions, and looking forward to hearing
your thoughts, here or in the PR.

Thanks,

Alex

[1] https://github.com/apache/polaris/pull/610
[2] https://github.com/apache/polaris/issues/537

On Wed, Jan 15, 2025 at 5:06 PM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Alex
>
> I agree with Docker changes, it sounds good to me.
> Removing the root Dockerfile in favor of the "Quarkus" approach is way
> better imho.
>
> Regarding EclipseLink and drivers, it makes sense to start with
> PostgreSQL and H2. Depending on how we move forward regarding
> EclipseLink (if we keep it or not), we can add additional drivers (if
> they are ASF compliant as you said).
>
> The binary distribution looks good to me (as said in another email,
> assuming we have clean LICENSE/NOTICE for this distribution).
>
> Thanks !
> Regards
> JB
>
>
> On Tue, Jan 14, 2025 at 3:18 PM Alex Dutra
> <alex.du...@dremio.com.invalid> wrote:
> >
> > Hi all,
> >
> > As mentioned already in another email earlier today, we still have a few
> > PRs to merge to fully achieve the transition to Quarkus.
> >
> > The first one is the Docker PR:
> https://github.com/apache/polaris/pull/610
> >
> > Since it's an important topic, I would like to summarize the most
> important
> > changes here:
> >
> > 1) The Dockerfile at the root of the repo has gone. It's not needed
> > anymore, and has security flaws – see
> > https://github.com/apache/polaris/issues/537 for details). If users
> want to
> > build a local image for adhoc testing, this can now be done by simply
> > building Polaris with:
> >
> >   ./gradlew assemble -Dquarkus.container-image.build=true
> >
> > 2) The docker-compose file for regression tests has been moved to the
> > regtests folder. Regression tests can be run with:
> >
> >   ./gradlew assemble -Dquarkus.container-image.build=true
> >    docker compose -f regtests/docker-compose.yml up --build
> > --exit-code-from regtest
> >
> > 3) Eclipse Link and JDBC drivers
> >
> > This is perhaps the most important change. With Quarkus, it's impossible
> to
> > add jars at runtime. Therefore, Polaris server artifacts *must* be built
> > with all the required dependencies.
> >
> > From now on, the polaris-eclipse-link module includes 2 JDBC drivers: H2
> > and Postgres.
> >
> > Also from now on, the polaris-quarkus-server artifacts (distribution
> > tarballs, zips and the Docker images) will contain the 2 drivers.
> >
> > From the legal perspective, both drivers are compatible with ASF
> guidelines
> > and the license and notice files were updated accordingly.
> >
> > I would argue that this change greatly simplifies the statu quo by not
> > requiring users to include their own JDBC jars anymore, and make Polaris
> > with EclipseLink "just work". There is also no more need to modify the
> > Dockerfiles to include EclipseLink dependencies via build args.
> >
> > We can add more JDBC drivers in the future if we want, for example, the
> > MariaDB JDBC driver is compatible with ASF and connects to both MySQL and
> > MariaDB databases.
> >
> > 4) Future binary releases
> >
> > With this PR, the distribution tarballs and zips generated for
> > polaris-quarkus-server are valid ones. You can generate the distribution
> > artifacts with:
> >
> >    ./gradlew clean build
> >
> > You can unpack and run the distribution zip with the following commands:
> >
> >   cd quarkus/server/build/distributions
> >   unzip polaris-quarkus-server-1.0.0-incubating-SNAPSHOT.zip
> >   cd polaris-quarkus-server-1.0.0-incubating-SNAPSHOT
> >   java -jar quarkus-run.jar
> >
> > However, this PR intentionally leaves the question of Docker image
> releases
> > for a follow-up discussion, mainly because we still need to figure out
> our
> > strategy wrt which registries we want to push to, and which platforms we
> > want to support.
> >
> > --
> >
> > I am eager to get the community's feedback on this, either in this email
> > thread or in the PR directly.
> >
> > Thanks,
> >
> > Alex
>

Reply via email to