An additional thing I forgot to mention was that if we only had portable
runners our BOM story would be simplified since we wouldn't have the runner
on the classpath and users would have a consistent experience across
runners with regards to dependency convergence.

On Fri, Oct 23, 2020 at 6:15 AM Piotr Szuberski <[email protected]>
wrote:

> Thank you for pointing it out. The awareness problem fits me well here - I
> have a good lesson to discuss things on the devlist.
>
> About SolrIO - I'll create a thread on @users to discuss which versions
> should be supported and make relevant changes after getting a conclusion.
>
> On 2020/10/22 14:24:45, Ismaël Mejía <[email protected]> wrote:
> > I have seen ongoing work on upgrading dependencies, this is a great task
> needed
> > for the health of the project and its IO connectors, however I am a bit
> worried
> > on the impact of these on existing users. We should be aware that we
> support old
> > versions of the clients for valid reasons. If we update a version of a
> client we
> > should ensure that it still interacts correctly with existing users and
> runtime
> > systems. Basically we need two conditions:
> >
> > 1. We cannot update dependencies without considering the current use of
> them.
> > 2. We must avoid upgrading to a non-stable or non-LTS dependency version
> >
> > For (1) in a recent thread Piotr brang some issues about updating Hadoop
> > dependencies to version 3. This surprised me because the whole Big Data
> > ecosystem is just catching up with Hadoop 3  (Flink does not even release
> > artifacts for this yet, and Spark just started on version 3 some months
> ago),
> > which means that most of our users still need that we guarantee
> compatiblity
> > with Hadoop 2.x dependencies.
> >
> > The Hadoop dependencies are mostly 'provided' so a way to achieve this
> is by
> > creating new test configurations that guarantees backwards (or forwards)
> > compatibility by providing the respective versions. This is similar to
> what we
> > do currently in KafkaIO by using by default version 1.0.0 but testing
> > compatibility with 2.1.0 by providing the right dependencies too.
> >
> > The same thread discusses also upgrading to version 3.3.x the latest,
> but per
> > (2) we should not consider upgrades to non stable versions which of
> Hadoop  is
> > currently 3.2.1.  https://hadoop.apache.org/docs/stable/
> >
> > I also saw a recent upgrade of SolrIO to version 8 which may affect some
> users
> > of previous versions with no discussion about it on the mailing lists
> and no
> > backwards compatibility guarantees.
> > https://github.com/apache/beam/pull/13027
> >
> > In the Solr case I think probably this update makes more sense since
> Solr 5.x
> > is deprecated and less people would be probably impacted but still it
> would
> > have been good to discuss this on user@
> >
> > I don't know how we can find a good equilibrium between deciding on those
> > upgrades from maintainers vs users without adding much overhead. Should
> we have
> > a VOTE maybe for the most sensible dependencies? or just assume this is a
> > criteria for the maintainers, I am afraid we may end up with
> > incompatible changes
> > due to the lack of awareness or for not much in return but at the same
> > time I wonder if it makes sense to add the extra work of discussion
> > for minor dependencies where this matters less.
> >
> > Should we document maybe the sensible dependency upgrades (the recent
> > thread on Avro upgrade comes to my mind too)? Or should we have the same
> > criteria for all.  Other ideas?
> >
>

Reply via email to