I wanted to bring this discussion back. Piotr has a couple of good detailed threads on particular IOs and maybe we can decide on a general pattern for this. I think Piotr said something like "keep the main IO compatible with recent stable version, and add testing for past versions to support". This sounds good and is very straightforward. I think sdks/java/io/elasticsearch-tests is an example of this pattern. Is there anything we could do in code to support it and make it easier and more "the default"? Which IOs have more complex overlaps (for example dep conflicts with runners or other IOs that might need cross-testing)?
This I still think that when we upgrade an IO it should be have a little process, like a specific thread on dev@ and user@ where we decide whether we also need to add a test module, then a VOTE to record the consensus. I want to emphasize the purpose of the VOTE is not to make decisions, but to do it *after* we have agreement, and it is just a nice thread to be able to link to as a reminder. And having a process that requires a VOTE is just a way to make sure that we remember to have the pre-VOTE discussion. Kenn On Fri, Oct 23, 2020 at 10:55 AM Robert Bradshaw <[email protected]> wrote: > On Fri, Oct 23, 2020 at 10:16 AM Luke Cwik <[email protected]> wrote: > > > > An additional thing I forgot to mention was that if we only had portable > runners our BOM story would be simplified since we wouldn't have the runner > on the classpath and users would have a consistent experience across > runners with regards to dependency convergence. > > While that may be true in principle, I think that once we move > everything over to portable runners there will still be a strong > desire to use "embedded" rather than "docker" environments for the > pure-java usecases, which would require compatible classpaths. > > > > On Fri, Oct 23, 2020 at 6:15 AM Piotr Szuberski < > [email protected]> wrote: > >> > >> Thank you for pointing it out. The awareness problem fits me well here > - I have a good lesson to discuss things on the devlist. > >> > >> About SolrIO - I'll create a thread on @users to discuss which versions > should be supported and make relevant changes after getting a conclusion. > >> > >> On 2020/10/22 14:24:45, Ismaël Mejía <[email protected]> wrote: > >> > I have seen ongoing work on upgrading dependencies, this is a great > task needed > >> > for the health of the project and its IO connectors, however I am a > bit worried > >> > on the impact of these on existing users. We should be aware that we > support old > >> > versions of the clients for valid reasons. If we update a version of > a client we > >> > should ensure that it still interacts correctly with existing users > and runtime > >> > systems. Basically we need two conditions: > >> > > >> > 1. We cannot update dependencies without considering the current use > of them. > >> > 2. We must avoid upgrading to a non-stable or non-LTS dependency > version > >> > > >> > For (1) in a recent thread Piotr brang some issues about updating > Hadoop > >> > dependencies to version 3. This surprised me because the whole Big > Data > >> > ecosystem is just catching up with Hadoop 3 (Flink does not even > release > >> > artifacts for this yet, and Spark just started on version 3 some > months ago), > >> > which means that most of our users still need that we guarantee > compatiblity > >> > with Hadoop 2.x dependencies. > >> > > >> > The Hadoop dependencies are mostly 'provided' so a way to achieve > this is by > >> > creating new test configurations that guarantees backwards (or > forwards) > >> > compatibility by providing the respective versions. This is similar > to what we > >> > do currently in KafkaIO by using by default version 1.0.0 but testing > >> > compatibility with 2.1.0 by providing the right dependencies too. > >> > > >> > The same thread discusses also upgrading to version 3.3.x the latest, > but per > >> > (2) we should not consider upgrades to non stable versions which of > Hadoop is > >> > currently 3.2.1. https://hadoop.apache.org/docs/stable/ > >> > > >> > I also saw a recent upgrade of SolrIO to version 8 which may affect > some users > >> > of previous versions with no discussion about it on the mailing lists > and no > >> > backwards compatibility guarantees. > >> > https://github.com/apache/beam/pull/13027 > >> > > >> > In the Solr case I think probably this update makes more sense since > Solr 5.x > >> > is deprecated and less people would be probably impacted but still it > would > >> > have been good to discuss this on user@ > >> > > >> > I don't know how we can find a good equilibrium between deciding on > those > >> > upgrades from maintainers vs users without adding much overhead. > Should we have > >> > a VOTE maybe for the most sensible dependencies? or just assume this > is a > >> > criteria for the maintainers, I am afraid we may end up with > >> > incompatible changes > >> > due to the lack of awareness or for not much in return but at the same > >> > time I wonder if it makes sense to add the extra work of discussion > >> > for minor dependencies where this matters less. > >> > > >> > Should we document maybe the sensible dependency upgrades (the recent > >> > thread on Avro upgrade comes to my mind too)? Or should we have the > same > >> > criteria for all. Other ideas? > >> > >
