On Fri, Oct 23, 2020 at 10:16 AM Luke Cwik <[email protected]> wrote: > > An additional thing I forgot to mention was that if we only had portable > runners our BOM story would be simplified since we wouldn't have the runner > on the classpath and users would have a consistent experience across runners > with regards to dependency convergence.
While that may be true in principle, I think that once we move everything over to portable runners there will still be a strong desire to use "embedded" rather than "docker" environments for the pure-java usecases, which would require compatible classpaths. > On Fri, Oct 23, 2020 at 6:15 AM Piotr Szuberski <[email protected]> > wrote: >> >> Thank you for pointing it out. The awareness problem fits me well here - I >> have a good lesson to discuss things on the devlist. >> >> About SolrIO - I'll create a thread on @users to discuss which versions >> should be supported and make relevant changes after getting a conclusion. >> >> On 2020/10/22 14:24:45, Ismaël Mejía <[email protected]> wrote: >> > I have seen ongoing work on upgrading dependencies, this is a great task >> > needed >> > for the health of the project and its IO connectors, however I am a bit >> > worried >> > on the impact of these on existing users. We should be aware that we >> > support old >> > versions of the clients for valid reasons. If we update a version of a >> > client we >> > should ensure that it still interacts correctly with existing users and >> > runtime >> > systems. Basically we need two conditions: >> > >> > 1. We cannot update dependencies without considering the current use of >> > them. >> > 2. We must avoid upgrading to a non-stable or non-LTS dependency version >> > >> > For (1) in a recent thread Piotr brang some issues about updating Hadoop >> > dependencies to version 3. This surprised me because the whole Big Data >> > ecosystem is just catching up with Hadoop 3 (Flink does not even release >> > artifacts for this yet, and Spark just started on version 3 some months >> > ago), >> > which means that most of our users still need that we guarantee >> > compatiblity >> > with Hadoop 2.x dependencies. >> > >> > The Hadoop dependencies are mostly 'provided' so a way to achieve this is >> > by >> > creating new test configurations that guarantees backwards (or forwards) >> > compatibility by providing the respective versions. This is similar to >> > what we >> > do currently in KafkaIO by using by default version 1.0.0 but testing >> > compatibility with 2.1.0 by providing the right dependencies too. >> > >> > The same thread discusses also upgrading to version 3.3.x the latest, but >> > per >> > (2) we should not consider upgrades to non stable versions which of Hadoop >> > is >> > currently 3.2.1. https://hadoop.apache.org/docs/stable/ >> > >> > I also saw a recent upgrade of SolrIO to version 8 which may affect some >> > users >> > of previous versions with no discussion about it on the mailing lists and >> > no >> > backwards compatibility guarantees. >> > https://github.com/apache/beam/pull/13027 >> > >> > In the Solr case I think probably this update makes more sense since Solr >> > 5.x >> > is deprecated and less people would be probably impacted but still it would >> > have been good to discuss this on user@ >> > >> > I don't know how we can find a good equilibrium between deciding on those >> > upgrades from maintainers vs users without adding much overhead. Should we >> > have >> > a VOTE maybe for the most sensible dependencies? or just assume this is a >> > criteria for the maintainers, I am afraid we may end up with >> > incompatible changes >> > due to the lack of awareness or for not much in return but at the same >> > time I wonder if it makes sense to add the extra work of discussion >> > for minor dependencies where this matters less. >> > >> > Should we document maybe the sensible dependency upgrades (the recent >> > thread on Avro upgrade comes to my mind too)? Or should we have the same >> > criteria for all. Other ideas? >> >
