Hi everyone,

Sorry in advance for a long email. 
TL;DR: Let’s discuss the next steps to update Avro dependency in Beam.

I’d like to come back to this old and quite sensitive topic here which is 
Apache Avro version update in Beam. Along the time, we already had several 
discussions on this (for example [1]) but without any concrete resolutions in 
the end, iirc.

As we all know, Beam still depends on quite old Avro version 1.8.2 and there 
were some attempts to bump it to more recent ones. One of the main reasons to 
bump an Avro version, imho, is that Avro 1.8.2 dependency brings several CVEs 
[2], but the latest Avro 1.11.0 brings only one [3]

In the same time, this update with introduce some incompatible changes that 
Avro has between versions and this may affect Beam users and potentially it may 
affect transitive dependencies while using Beam with other project that use 
Avro as well:
- Avro completely moved to java.time.* instead of org.joda.time.*. So, we need 
to adjust date/time conversions from/to Beam schema accordingly since Beam 
schema still uses joda.time. It will require users to regenerate already 
generated Java code with avro-compiler (if any) otherwise it won’t compile; 
- Some minor changes in Avro dependencies and user API;
- Something else?

I know that here, on the list, we have people from Avro community that are much 
more experienced in this than me - so, please correct me if I say something 
wrong or not 100% correct. 


In Beam, we also performed several attempts to update Avro - for example, [4], 
[5], [6] and others.

To make such update easier in the future, we also discussed to move Avro 
dependency out of core Beam [7] and there were an attempt to do that [8] by 
finally this PR was closed with a resolution that it’s not actually needed and 
we may just want to test Beam with different Avro versions [9] 

The latest work on this was a PR to support several versions of Avro in Beam 
(1.8.x and 1.9.x) [10] which still introduces some breaking changes for users, 
iirc.

So, seems that we are a bit stuck on this topic, though, imho, we need to 
decide how move forward mostly because of CVEs in old Avro versions and future 
Avro updates in Beam.

The potential options (as I can see them):

1) Bump Avro dependency to the latest one (1.11.0) or the possible more recent 
one
        - Pros: 
                - latest/recent Avro dependency; 
                - potentially easy to update in the future;
        - Cons: 
                - breaking change for users; 
                - potentially issues with other projects that use Avro (like 
Apache Spark e.g.).

2) Support different Avro versions in Beam, make Avro dependency provided 
        - Pros: 
                - user decides which versions to use;
                - easy to update in the future;
        - Cons: 
                - breaking change for users; 
                - not fact that it’s possible to implement in reality; 
                - more tests to test Beam with different Avro versions

3) Extract Avro as an extension, like we do for other formats, and update to 
latest Avro version, but keep and shade Avro for “core” needs as v.1.8.2 (still 
have an issue with CVEs)

4) Anything else?


Please, share your thoughts on this and correct me if I stated something wrong. 
The goal of this discussion is finally to move forward with Avro update topic.

—
Alexey 


[1] https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r 
<https://lists.apache.org/thread/bkwrbqg2nwp1xq1j57xt3kvmy93vpj9r>
[2] https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2 
<https://mvnrepository.com/artifact/org.apache.avro/avro/1.8.2>
[3] https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0 
<https://mvnrepository.com/artifact/org.apache.avro/avro/1.11.0>
[4] https://github.com/apache/beam/pull/9779 
<https://github.com/apache/beam/pull/9779>
[5] https://github.com/apache/beam/pull/17372 
<https://github.com/apache/beam/pull/17372>
[6] https://github.com/apache/beam/pull/17246 
<https://github.com/apache/beam/pull/17246>
[7] https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on 
<https://lists.apache.org/thread/fw4w6xgm05nl5cg502co97pt6cygt4on>
[8] https://github.com/apache/beam/pull/12748 
<https://github.com/apache/beam/pull/12748>
[9] https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg 
<https://lists.apache.org/thread/y76wjqprm8dyfxxfwcqbzxtht2qkrgzg>
[10] https://github.com/apache/beam/pull/16271 
<https://github.com/apache/beam/pull/16271>






Reply via email to