I have seen ongoing work on upgrading dependencies, this is a great task needed for the health of the project and its IO connectors, however I am a bit worried on the impact of these on existing users. We should be aware that we support old versions of the clients for valid reasons. If we update a version of a client we should ensure that it still interacts correctly with existing users and runtime systems. Basically we need two conditions:
1. We cannot update dependencies without considering the current use of them. 2. We must avoid upgrading to a non-stable or non-LTS dependency version For (1) in a recent thread Piotr brang some issues about updating Hadoop dependencies to version 3. This surprised me because the whole Big Data ecosystem is just catching up with Hadoop 3 (Flink does not even release artifacts for this yet, and Spark just started on version 3 some months ago), which means that most of our users still need that we guarantee compatiblity with Hadoop 2.x dependencies. The Hadoop dependencies are mostly 'provided' so a way to achieve this is by creating new test configurations that guarantees backwards (or forwards) compatibility by providing the respective versions. This is similar to what we do currently in KafkaIO by using by default version 1.0.0 but testing compatibility with 2.1.0 by providing the right dependencies too. The same thread discusses also upgrading to version 3.3.x the latest, but per (2) we should not consider upgrades to non stable versions which of Hadoop is currently 3.2.1. https://hadoop.apache.org/docs/stable/ I also saw a recent upgrade of SolrIO to version 8 which may affect some users of previous versions with no discussion about it on the mailing lists and no backwards compatibility guarantees. https://github.com/apache/beam/pull/13027 In the Solr case I think probably this update makes more sense since Solr 5.x is deprecated and less people would be probably impacted but still it would have been good to discuss this on user@ I don't know how we can find a good equilibrium between deciding on those upgrades from maintainers vs users without adding much overhead. Should we have a VOTE maybe for the most sensible dependencies? or just assume this is a criteria for the maintainers, I am afraid we may end up with incompatible changes due to the lack of awareness or for not much in return but at the same time I wonder if it makes sense to add the extra work of discussion for minor dependencies where this matters less. Should we document maybe the sensible dependency upgrades (the recent thread on Avro upgrade comes to my mind too)? Or should we have the same criteria for all. Other ideas?
