An additional thing I forgot to mention was that if we only had portable runners our BOM story would be simplified since we wouldn't have the runner on the classpath and users would have a consistent experience across runners with regards to dependency convergence.
On Fri, Oct 23, 2020 at 6:15 AM Piotr Szuberski <[email protected]> wrote: > Thank you for pointing it out. The awareness problem fits me well here - I > have a good lesson to discuss things on the devlist. > > About SolrIO - I'll create a thread on @users to discuss which versions > should be supported and make relevant changes after getting a conclusion. > > On 2020/10/22 14:24:45, Ismaël Mejía <[email protected]> wrote: > > I have seen ongoing work on upgrading dependencies, this is a great task > needed > > for the health of the project and its IO connectors, however I am a bit > worried > > on the impact of these on existing users. We should be aware that we > support old > > versions of the clients for valid reasons. If we update a version of a > client we > > should ensure that it still interacts correctly with existing users and > runtime > > systems. Basically we need two conditions: > > > > 1. We cannot update dependencies without considering the current use of > them. > > 2. We must avoid upgrading to a non-stable or non-LTS dependency version > > > > For (1) in a recent thread Piotr brang some issues about updating Hadoop > > dependencies to version 3. This surprised me because the whole Big Data > > ecosystem is just catching up with Hadoop 3 (Flink does not even release > > artifacts for this yet, and Spark just started on version 3 some months > ago), > > which means that most of our users still need that we guarantee > compatiblity > > with Hadoop 2.x dependencies. > > > > The Hadoop dependencies are mostly 'provided' so a way to achieve this > is by > > creating new test configurations that guarantees backwards (or forwards) > > compatibility by providing the respective versions. This is similar to > what we > > do currently in KafkaIO by using by default version 1.0.0 but testing > > compatibility with 2.1.0 by providing the right dependencies too. > > > > The same thread discusses also upgrading to version 3.3.x the latest, > but per > > (2) we should not consider upgrades to non stable versions which of > Hadoop is > > currently 3.2.1. https://hadoop.apache.org/docs/stable/ > > > > I also saw a recent upgrade of SolrIO to version 8 which may affect some > users > > of previous versions with no discussion about it on the mailing lists > and no > > backwards compatibility guarantees. > > https://github.com/apache/beam/pull/13027 > > > > In the Solr case I think probably this update makes more sense since > Solr 5.x > > is deprecated and less people would be probably impacted but still it > would > > have been good to discuss this on user@ > > > > I don't know how we can find a good equilibrium between deciding on those > > upgrades from maintainers vs users without adding much overhead. Should > we have > > a VOTE maybe for the most sensible dependencies? or just assume this is a > > criteria for the maintainers, I am afraid we may end up with > > incompatible changes > > due to the lack of awareness or for not much in return but at the same > > time I wonder if it makes sense to add the extra work of discussion > > for minor dependencies where this matters less. > > > > Should we document maybe the sensible dependency upgrades (the recent > > thread on Avro upgrade comes to my mind too)? Or should we have the same > > criteria for all. Other ideas? > > >
