Hey everyone, I appreciate that the Beam community is having another look at this.
I don't have any strong opinions on these options, but think that extracting Avro in its own extension is the better choice. I've already given a stab at extracting Avro in its own module (PR: 12748 [1]). The way I see it, this is less of a technical challenge and more of a Beam community consensus challenge because of backwards compatibility and historical reasons. If the beam community does reach consensus, I'd be more than happy to do the work. Cheers, Cristian [1] https://github.com/apache/beam/pull/12748 On Fri, May 13, 2022 at 8:56 AM Moritz Mack <mm...@talend.com> wrote: > Thanks so much for all these pointers, Alexey. Having that context really > helps! > > > > Skimming through the past conversations, this one key consideration hasn’t > changed and seems still critical: > > AvroCoder is the de facto standard for encoding complex user types (with > SchemaCoder still being experimental). > > Consequently, any backwards incompatible change in that respect would > massively impact users. > > > > Having that in mind, here’s my 2 cents on the options mentioned: > > > > 1. Bump Avro in core: > 1. This will seamlessly work for users that just need a coder for > complex types, but don’t use Avro themselves. > 2. Anyone using Avro APIs from user code might get into trouble. > 2. Support different Avro versions & making the dependency provided > 1. In terms of user impact, making Avro provided will be very > troublesome. Unless Avro remains available on the classpath through > different ways, this would break things in a really bad way. Worst of > all, > it might not even break at compile time but at runtime only. > 2. I don’t think this option requires Avro being a provided > dependency. If the default is kept as is, anyone can (safely) bump Avro > to > a higher (supported) version on their classpath. In that case it would > be > good to warn users about CVEs in logs if they remain on an old version > of > Avro. But I’m not sure how feasible (reliable) it is to detect the Avro > minor version given that things might be repackaged into fat jars. > 3. Extract Avro as an extension > 1. I agree with Brian, I don’t think there’s a need to keep a > shaded/vendored Avro in core… but > 2. It’s pretty hard to get this done without causing trouble for > users. Making this a seamless migration would require core to depend on > the > new extension module and to maintain all the old public user facing > APIs. > 3. Duplicating packages between core and the new extension module > is going to cause issues with Java 11. That pretty much means existing > public user facing APIs have to remain as a (deprecated) façade in core > using the code in the new extension. > 4. The above makes the effort almost look worthless in the short > term, but it would certainly help to have a well-defined scope that can > be > tested for compatibility with multiple different Avro versions. > > > > To finally make progress on this topic, I’d suggest starting off with > supporting multiple different Avro versions in core (2) but to keep the > default Avro version as is. > > At the same time, extracting Avro as extension in a* backwards compatible > *way seems to be the right path forward long term together with a well > communicated deprecation plan. > > > > Best, > > Moritz > > > > > > *From: *Brian Hulette <bhule...@google.com> > *Date: *Thursday, 12. May 2022 at 23:21 > *To: *dev <dev@beam.apache.org> > *Subject: *Re: [DISCUSS] Next steps for update of Avro dependency in Beam > > Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 > (still have an issue with CVEs)" Do we actually need to keep avro in core > for any reason? I thought we only had it in core for AvroCoder, schema > support, and > > ZjQcmQRYFpfptBannerStart > > *This Message Is From an External Sender * > > This message came from outside your organization. > > Exercise caution when opening attachments or clicking any links. > > ZjQcmQRYFpfptBannerEnd > > Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2 > (still have an issue with CVEs)" > > > > Do we actually need to keep avro in core for any reason? I thought we only > had it in core for AvroCoder, schema support, and IOs, which I think are > all reasonable to separate out into an extension (this would be comparable > to the protobuf extension). To confirm I just grepped for files in core > that import avro: > > ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main > > sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java > > sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java > > sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java > > sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java > > sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java > > sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java > sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java > sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java > > sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java > sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java > > sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java > > sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java > sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java > > Brian > > > > On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <rober...@google.com> > wrote: > > Keeping avro in our public (core) API and as an internal dependency > seems to be a recurring pain point, I would be all for pulling it out > (basically option 3) and subsequently updating our internal version > (hopefully no backwards compatibility issues here) and letting the > extension live with a variety of versions (insofar as this is > feasible). > > On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko > <aromanenko....@gmail.com> wrote: > > > > Hi everyone, > > > > Sorry in advance for a long email. > > TL;DR: Let’s discuss the next steps to update Avro dependency in Beam. > > > > I’d like to come back to this old and quite sensitive topic here which > is Apache Avro version update in Beam. Along the time, we already had > several discussions on this (for example [1]) but without any concrete > resolutions in the end, iirc. > > > > As we all know, Beam still depends on quite old Avro version 1.8.2 and > there were some attempts to bump it to more recent ones. One of the main > reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings > several CVEs [2], but the latest Avro 1.11.0 brings only one [3] > > > > In the same time, this update with introduce some incompatible changes > that Avro has between versions and this may affect Beam users and > potentially it may affect transitive dependencies while using Beam with > other project that use Avro as well: > > - Avro completely moved to java.time.* instead of org.joda.time.*. So, > we need to adjust date/time conversions from/to Beam schema accordingly > since Beam schema still uses joda.time. It will require users to regenerate > already generated Java code with avro-compiler (if any) otherwise it won’t > compile; > > - Some minor changes in Avro dependencies and user API; > > - Something else? > > > > I know that here, on the list, we have people from Avro community that > are much more experienced in this than me - so, please correct me if I say > something wrong or not 100% correct. > > > > > > In Beam, we also performed several attempts to update Avro - for > example, [4], [5], [6] and others. > > > > To make such update easier in the future, we also discussed to move Avro > dependency out of core Beam [7] and there were an attempt to do that [8] by > finally this PR was closed with a resolution that it’s not actually needed > and we may just want to test Beam with different Avro versions [9] > > > > The latest work on this was a PR to support several versions of Avro in > Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes > for users, iirc. > > > > So, seems that we are a bit stuck on this topic, though, imho, we need > to decide how move forward mostly because of CVEs in old Avro versions and > future Avro updates in Beam. > > > > The potential options (as I can see them): > > > > 1) Bump Avro dependency to the latest one (1.11.0) or the possible more > recent one > > - Pros: > > - latest/recent Avro dependency; > > - potentially easy to update in the future; > > - Cons: > > - breaking change for users; > > - potentially issues with other projects that use Avro (like Apache > Spark e.g.). > > > > 2) Support different Avro versions in Beam, make Avro dependency provided > > - Pros: > > - user decides which versions to use; > > - easy to update in the future; > > - Cons: > > - breaking change for users; > > - not fact that it’s possible to implement in reality; > > - more tests to test Beam with different Avro versions > > > > 3) Extract Avro as an extension, like we do for other formats, and > update to latest Avro version, but keep and shade Avro for “core” needs as > v.1.8.2 (still have an issue with CVEs) > > > > 4) Anything else? > > > > > > Please, share your thoughts on this and correct me if I stated something > wrong. The goal of this discussion is finally to move forward with Avro > update topic. > > > > — > > Alexey > > > > > > [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r > <https://urldefense.com/v3/__https:/lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9kMOM1iA$> > > [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2 > <https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-JQRaI8A$> > > [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0 > <https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-P2VwtLA$> > > [4] https://github.com/apache/beam/pull/9779 > <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/9779__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-C6txcQA$> > > [5] https://github.com/apache/beam/pull/17372 > <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17372__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw8mE0kFWg$> > > [6] https://github.com/apache/beam/pull/17246 > <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17246__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-6l8e4lQ$> > > [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on > <https://urldefense.com/v3/__https:/lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9iPVZxRg$> > > [8] https://github.com/apache/beam/pull/12748 > <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/12748__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9YWZV1hQ$> > > [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg > <https://urldefense.com/v3/__https:/lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9T8IrnEg$> > > [10] https://github.com/apache/beam/pull/16271 > <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/16271__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw81M4oMhQ$> > > > > > > > > > > > > > > *As a recipient of an email from Talend, your contact personal data will > be on our systems. Please see our privacy notice. > <https://www.talend.com/privacy/>* > > >