On Fri, Oct 23, 2020 at 10:16 AM Luke Cwik <[email protected]> wrote:
>
> An additional thing I forgot to mention was that if we only had portable 
> runners our BOM story would be simplified since we wouldn't have the runner 
> on the classpath and users would have a consistent experience across runners 
> with regards to dependency convergence.

While that may be true in principle, I think that once we move
everything over to portable runners there will still be a strong
desire to use "embedded" rather than "docker" environments for the
pure-java usecases, which would require compatible classpaths.


> On Fri, Oct 23, 2020 at 6:15 AM Piotr Szuberski <[email protected]> 
> wrote:
>>
>> Thank you for pointing it out. The awareness problem fits me well here - I 
>> have a good lesson to discuss things on the devlist.
>>
>> About SolrIO - I'll create a thread on @users to discuss which versions 
>> should be supported and make relevant changes after getting a conclusion.
>>
>> On 2020/10/22 14:24:45, Ismaël Mejía <[email protected]> wrote:
>> > I have seen ongoing work on upgrading dependencies, this is a great task 
>> > needed
>> > for the health of the project and its IO connectors, however I am a bit 
>> > worried
>> > on the impact of these on existing users. We should be aware that we 
>> > support old
>> > versions of the clients for valid reasons. If we update a version of a 
>> > client we
>> > should ensure that it still interacts correctly with existing users and 
>> > runtime
>> > systems. Basically we need two conditions:
>> >
>> > 1. We cannot update dependencies without considering the current use of 
>> > them.
>> > 2. We must avoid upgrading to a non-stable or non-LTS dependency version
>> >
>> > For (1) in a recent thread Piotr brang some issues about updating Hadoop
>> > dependencies to version 3. This surprised me because the whole Big Data
>> > ecosystem is just catching up with Hadoop 3  (Flink does not even release
>> > artifacts for this yet, and Spark just started on version 3 some months 
>> > ago),
>> > which means that most of our users still need that we guarantee 
>> > compatiblity
>> > with Hadoop 2.x dependencies.
>> >
>> > The Hadoop dependencies are mostly 'provided' so a way to achieve this is 
>> > by
>> > creating new test configurations that guarantees backwards (or forwards)
>> > compatibility by providing the respective versions. This is similar to 
>> > what we
>> > do currently in KafkaIO by using by default version 1.0.0 but testing
>> > compatibility with 2.1.0 by providing the right dependencies too.
>> >
>> > The same thread discusses also upgrading to version 3.3.x the latest, but 
>> > per
>> > (2) we should not consider upgrades to non stable versions which of Hadoop 
>> >  is
>> > currently 3.2.1.  https://hadoop.apache.org/docs/stable/
>> >
>> > I also saw a recent upgrade of SolrIO to version 8 which may affect some 
>> > users
>> > of previous versions with no discussion about it on the mailing lists and 
>> > no
>> > backwards compatibility guarantees.
>> > https://github.com/apache/beam/pull/13027
>> >
>> > In the Solr case I think probably this update makes more sense since Solr 
>> > 5.x
>> > is deprecated and less people would be probably impacted but still it would
>> > have been good to discuss this on user@
>> >
>> > I don't know how we can find a good equilibrium between deciding on those
>> > upgrades from maintainers vs users without adding much overhead. Should we 
>> > have
>> > a VOTE maybe for the most sensible dependencies? or just assume this is a
>> > criteria for the maintainers, I am afraid we may end up with
>> > incompatible changes
>> > due to the lack of awareness or for not much in return but at the same
>> > time I wonder if it makes sense to add the extra work of discussion
>> > for minor dependencies where this matters less.
>> >
>> > Should we document maybe the sensible dependency upgrades (the recent
>> > thread on Avro upgrade comes to my mind too)? Or should we have the same
>> > criteria for all.  Other ideas?
>> >

Reply via email to