I have seen ongoing work on upgrading dependencies, this is a great task needed
for the health of the project and its IO connectors, however I am a bit worried
on the impact of these on existing users. We should be aware that we support old
versions of the clients for valid reasons. If we update a version of a client we
should ensure that it still interacts correctly with existing users and runtime
systems. Basically we need two conditions:

1. We cannot update dependencies without considering the current use of them.
2. We must avoid upgrading to a non-stable or non-LTS dependency version

For (1) in a recent thread Piotr brang some issues about updating Hadoop
dependencies to version 3. This surprised me because the whole Big Data
ecosystem is just catching up with Hadoop 3  (Flink does not even release
artifacts for this yet, and Spark just started on version 3 some months ago),
which means that most of our users still need that we guarantee compatiblity
with Hadoop 2.x dependencies.

The Hadoop dependencies are mostly 'provided' so a way to achieve this is by
creating new test configurations that guarantees backwards (or forwards)
compatibility by providing the respective versions. This is similar to what we
do currently in KafkaIO by using by default version 1.0.0 but testing
compatibility with 2.1.0 by providing the right dependencies too.

The same thread discusses also upgrading to version 3.3.x the latest, but per
(2) we should not consider upgrades to non stable versions which of Hadoop  is
currently 3.2.1.  https://hadoop.apache.org/docs/stable/

I also saw a recent upgrade of SolrIO to version 8 which may affect some users
of previous versions with no discussion about it on the mailing lists and no
backwards compatibility guarantees.
https://github.com/apache/beam/pull/13027

In the Solr case I think probably this update makes more sense since Solr 5.x
is deprecated and less people would be probably impacted but still it would
have been good to discuss this on user@

I don't know how we can find a good equilibrium between deciding on those
upgrades from maintainers vs users without adding much overhead. Should we have
a VOTE maybe for the most sensible dependencies? or just assume this is a
criteria for the maintainers, I am afraid we may end up with
incompatible changes
due to the lack of awareness or for not much in return but at the same
time I wonder if it makes sense to add the extra work of discussion
for minor dependencies where this matters less.

Should we document maybe the sensible dependency upgrades (the recent
thread on Avro upgrade comes to my mind too)? Or should we have the same
criteria for all.  Other ideas?

Reply via email to