Hey everyone,

I appreciate that the Beam community is having another look at this.

I don't have any strong opinions on these options, but think
that extracting Avro in its own extension is the better choice.

I've already given a stab at extracting Avro in its own module (PR: 12748
[1]). The way I see it, this is less of a technical challenge and more of a
Beam community consensus challenge because of backwards compatibility and
historical reasons. If the beam community does reach consensus, I'd be more
than happy to do the work.

Cheers,
Cristian

[1] https://github.com/apache/beam/pull/12748

On Fri, May 13, 2022 at 8:56 AM Moritz Mack <mm...@talend.com> wrote:

> Thanks so much for all these pointers, Alexey. Having that context really
> helps!
>
>
>
> Skimming through the past conversations, this one key consideration hasn’t
> changed and seems still critical:
>
> AvroCoder is the de facto standard for encoding complex user types (with
> SchemaCoder still being experimental).
>
> Consequently, any backwards incompatible change in that respect would
> massively impact users.
>
>
>
> Having that in mind, here’s my 2 cents on the options mentioned:
>
>
>
>    1. Bump Avro in core:
>       1. This will seamlessly work for users that just need a coder for
>       complex types, but don’t use Avro themselves.
>       2. Anyone using Avro APIs from user code might get into trouble.
>    2. Support different Avro versions & making the dependency provided
>       1. In terms of user impact, making Avro provided will be very
>       troublesome. Unless Avro remains available on the classpath through
>       different ways, this would break things in a really bad way. Worst of 
> all,
>       it might not even break at compile time but at runtime only.
>       2. I don’t think this option requires Avro being a provided
>       dependency. If the default is kept as is, anyone can (safely) bump Avro 
> to
>       a higher (supported) version on their classpath. In that case it would 
> be
>       good to warn users about CVEs in logs if they remain on an old version 
> of
>       Avro. But I’m not sure how feasible (reliable) it is to detect the Avro
>       minor version given that things might be repackaged into fat jars.
>    3. Extract Avro as an extension
>       1. I agree with Brian, I don’t think there’s a need to keep a
>       shaded/vendored Avro in core… but
>       2. It’s pretty hard to get this done without causing trouble for
>       users. Making this a seamless migration would require core to depend on 
> the
>       new extension module and to maintain all the old public user facing 
> APIs.
>       3. Duplicating packages between core and the new extension module
>       is going to cause issues with Java 11. That pretty much means existing
>       public user facing APIs have to remain as a (deprecated) façade in core
>       using the code in the new extension.
>       4. The above makes the effort almost look worthless in the short
>       term, but it would certainly help to have a well-defined scope that can 
> be
>       tested for compatibility with multiple different Avro versions.
>
>
>
> To finally make progress on this topic, I’d suggest starting off with
> supporting multiple different Avro versions in core (2) but to keep the
> default Avro version as is.
>
> At the same time, extracting Avro as extension in a* backwards compatible
> *way seems to be the right path forward long term together with a well
> communicated deprecation plan.
>
>
>
> Best,
>
> Moritz
>
>
>
>
>
> *From: *Brian Hulette <bhule...@google.com>
> *Date: *Thursday, 12. May 2022 at 23:21
> *To: *dev <dev@beam.apache.org>
> *Subject: *Re: [DISCUSS] Next steps for update of Avro dependency in Beam
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2
> (still have an issue with CVEs)" Do we actually need to keep avro in core
> for any reason? I thought we only had it in core for AvroCoder, schema
> support, and
>
> ZjQcmQRYFpfptBannerStart
>
> *This Message Is From an External Sender *
>
> This message came from outside your organization.
>
> Exercise caution when opening attachments or clicking any links.
>
> ZjQcmQRYFpfptBannerEnd
>
> Regarding Option (3) "but keep and shade Avro for “core” needs as v.1.8.2
> (still have an issue with CVEs)"
>
>
>
> Do we actually need to keep avro in core for any reason? I thought we only
> had it in core for AvroCoder, schema support, and IOs, which I think are
> all reasonable to separate out into an extension (this would be comparable
> to the protobuf extension). To confirm I just grepped for files in core
> that import avro:
>
> ❯ grep -liIrn 'import org\.apache\.avro' sdks/java/core/src/main
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroByteBuddyUtils.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/ConvertHelpers.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/AvroRecordSchema.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/DynamicAvroDestinations.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/SerializableAvroCodecFactory.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSink.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSchemaIOProvider.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/io/ConstantAvroDestination.java
>
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroGenericCoder.java
> sdks/java/core/src/main/java/org/apache/beam/sdk/coders/AvroCoder.java
>
> Brian
>
>
>
> On Thu, May 12, 2022 at 2:08 PM Robert Bradshaw <rober...@google.com>
> wrote:
>
> Keeping avro in our public (core) API and as an internal dependency
> seems to be a recurring pain point, I would be all for pulling it out
> (basically option 3) and subsequently updating our internal version
> (hopefully no backwards compatibility issues here) and letting the
> extension live with a variety of versions (insofar as this is
> feasible).
>
> On Thu, May 12, 2022 at 10:29 AM Alexey Romanenko
> <aromanenko....@gmail.com> wrote:
> >
> > Hi everyone,
> >
> > Sorry in advance for a long email.
> > TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.
> >
> > I’d like to come back to this old and quite sensitive topic here which
> is Apache Avro version update in Beam. Along the time, we already had
> several discussions on this (for example [1]) but without any concrete
> resolutions in the end, iirc.
> >
> > As we all know, Beam still depends on quite old Avro version 1.8.2 and
> there were some attempts to bump it to more recent ones. One of the main
> reasons to bump an Avro version, imho, is that Avro 1.8.2 dependency brings
> several CVEs [2], but the latest Avro 1.11.0 brings only one [3]
> >
> > In the same time, this update with introduce some incompatible changes
> that Avro has between versions and this may affect Beam users and
> potentially it may affect transitive dependencies while using Beam with
> other project that use Avro as well:
> > - Avro completely moved to java.time.* instead of org.joda.time.*. So,
> we need to adjust date/time conversions from/to Beam schema accordingly
> since Beam schema still uses joda.time. It will require users to regenerate
> already generated Java code with avro-compiler (if any) otherwise it won’t
> compile;
> > - Some minor changes in Avro dependencies and user API;
> > - Something else?
> >
> > I know that here, on the list, we have people from Avro community that
> are much more experienced in this than me - so, please correct me if I say
> something wrong or not 100% correct.
> >
> >
> > In Beam, we also performed several attempts to update Avro - for
> example, [4], [5], [6] and others.
> >
> > To make such update easier in the future, we also discussed to move Avro
> dependency out of core Beam [7] and there were an attempt to do that [8] by
> finally this PR was closed with a resolution that it’s not actually needed
> and we may just want to test Beam with different Avro versions [9]
> >
> > The latest work on this was a PR to support several versions of Avro in
> Beam (1.8.x and 1.9.x) [10] which still introduces some breaking changes
> for users, iirc.
> >
> > So, seems that we are a bit stuck on this topic, though, imho, we need
> to decide how move forward mostly because of CVEs in old Avro versions and
> future Avro updates in Beam.
> >
> > The potential options (as I can see them):
> >
> > 1) Bump Avro dependency to the latest one (1.11.0) or the possible more
> recent one
> > - Pros:
> > - latest/recent Avro dependency;
> > - potentially easy to update in the future;
> > - Cons:
> > - breaking change for users;
> > - potentially issues with other projects that use Avro (like Apache
> Spark e.g.).
> >
> > 2) Support different Avro versions in Beam, make Avro dependency provided
> > - Pros:
> > - user decides which versions to use;
> > - easy to update in the future;
> > - Cons:
> > - breaking change for users;
> > - not fact that it’s possible to implement in reality;
> > - more tests to test Beam with different Avro versions
> >
> > 3) Extract Avro as an extension, like we do for other formats, and
> update to latest Avro version, but keep and shade Avro for “core” needs as
> v.1.8.2 (still have an issue with CVEs)
> >
> > 4) Anything else?
> >
> >
> > Please, share your thoughts on this and correct me if I stated something
> wrong. The goal of this discussion is finally to move forward with Avro
> update topic.
> >
> > —
> > Alexey
> >
> >
> > [1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r
> <https://urldefense.com/v3/__https:/lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9kMOM1iA$>
> > [2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2
> <https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.8.2__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-JQRaI8A$>
> > [3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0
> <https://urldefense.com/v3/__https:/mvnrepository.com/artifact/org.apache.avro/avro/1.11.0__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-P2VwtLA$>
> > [4] https://github.com/apache/beam/pull/9779
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/9779__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-C6txcQA$>
> > [5] https://github.com/apache/beam/pull/17372
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17372__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw8mE0kFWg$>
> > [6] https://github.com/apache/beam/pull/17246
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/17246__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw-6l8e4lQ$>
> > [7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on
> <https://urldefense.com/v3/__https:/lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9iPVZxRg$>
> > [8] https://github.com/apache/beam/pull/12748
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/12748__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9YWZV1hQ$>
> > [9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg
> <https://urldefense.com/v3/__https:/lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw9T8IrnEg$>
> > [10] https://github.com/apache/beam/pull/16271
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/16271__;!!CiXD_PY!U48m_hnNQcvhj_OqIeZdFQ8kew4_4WjRiSJ6WV9W6WwisJ1F48Slu8Uabj2DXWI3yku9gw81M4oMhQ$>
> >
> >
> >
> >
> >
> >
>
> *As a recipient of an email from Talend, your contact personal data will
> be on our systems. Please see our privacy notice.
> <https://www.talend.com/privacy/>*
>
>
>

Reply via email to