All the proposed solutions seem reasonable. I'm not sure if one has an edge
over the other. I guess it depends on how cautiously the community would
like to move.

Maybe it's just my impression, but it seems to me that there are a few
changes that are held back for the sake of backwards compatibility. If this
is truly the case, maybe we can start thinking about the next major version
of Beam where:
- the dependency report email (Beam Dependency Check Report (2020-09-21))
can be tackled. There are quite a few deps in there that need to be
updated. I'm sure there will be more breaking changes.
- review some architectural decisions that are difficult to correct without
breaking things (eg: Avro in Core)
- compatibility with Java 11+
- features that we can't implement with our current code base

I'm not sure what Beam's roadmap is, but maybe we could set up a branch in
the main repo (and the checks) and try to tackle all this work so we get a
better idea of the scope (and unforeseen issues that will come forward)
that's really needed for a potential RC build in the short term future.

On Wed, Sep 16, 2020 at 6:40 AM Robert Bradshaw <rober...@google.com> wrote:

> An adapter seems a reasonable approach, and shouldn't be too hard.
>
> If the breakage is "we no longer provide Avro 1.8 by default; please
> depend on it explicitly if this breaks you" that seems reasonable to me, as
> it's easy to detect and remedy.
>
> On Tue, Sep 15, 2020 at 2:42 PM Ismaël Mejía <ieme...@gmail.com> wrote:
>
>> Avro differences in our implementation are pretty minimal if you look at
>> the PR,
>> to the point that an Adapter should be really tiny if even needed.
>>
>> The big backwards incompatible changes in Avro > 1.8 were to remove
>> external
>> libraries from the public APIs e.g. guava, jackson and joda-time. Of
>> course this
>> does not seem to be much but almost every project using Avro was using
>> some of
>> these dependencies, luckily for Beam it was only joda-time and that is
>> already
>> fixed.
>>
>> Keeping backwards compatibility by making Avro part of an extension that
>> is
>> optional for core and using only Avro 1.8 compatible features on Beam's
>> source
>> code is the simplest path, and allow us to avoid all the issues, notice
>> that the
>> dependency that triggered the need for Avro 1.9 (and this thread) is a
>> runtime
>> dependency used by Confluent Schema Registry and this is an issue because
>> sdks-java-core is leaking Avro. Apart from this I am not aware of any
>> feature in
>> any other project that obliges anyone to use Avro 1.9 or 1.10 specific
>> code.
>>
>> Of course a really valid reason to want to use a more recent version of
>> Avro is
>> that Avro 1.8 leaks also unmaintained dependencies with serious security
>> issues
>> (Jackson 1.x).
>>
>> So in the end my main concern is breaking existing users code, this has
>> less
>> impact for us (Talend) but probably more for the rest of the community,
>> but if
>> we agree to break backwards compatibility for the sake of cleanliness
>> well we
>> should probably proceed, and of course give users also some warning.
>>
>> On Mon, Sep 14, 2020 at 7:13 PM Luke Cwik <lc...@google.com> wrote:
>> >
>> > In the Kafka module we reflectively figure out which version of Kafka
>> exists[1] on the classpath and then reflectively invoke some APIs to work
>> around differences in Kafka allowing our users to bring whichever version
>> they want.
>> >
>> > We could do something similar here and reflectively figure out which
>> Avro is on the classpath and invoke the appropriate methods. If we are
>> worried about performance of using reflection, we can write and compile two
>> different versions of an Avro adapter class and choose which one to use
>> (using reflection only once during class loading).
>> >
>> > e.g.
>> > AvroAdapter {
>> >   static final AvroAdapter INSTANCE;
>> >   static {
>> >     if (avro19?) {
>> >       INSTANCE = new Avro19Adapater();
>> >     } else {
>> >       INSTANCE = new Avro18Adapter();
>> >   }
>> >
>> >   ... methods needed for AvroAdapter implementations ...
>> > }
>> >
>> > Using reflection allows the user to choose which version they use and
>> pushes down the incompatibility issue from Apache Beam to our deps (e.g.
>> Spark).
>> >
>> > 1:
>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ConsumerSpEL.java
>> >
>> > On Fri, Sep 11, 2020 at 11:42 AM Kenneth Knowles <k...@apache.org>
>> wrote:
>> >>
>> >> I am not deep on the details myself but have reviewed various Avro
>> upgrade changes such as https://github.com/apache/beam/pull/9779 and
>> also some internal that I cannot link to. I believe the changes are small
>> and quite possibly we can create sdks/java/extensions/avro that works with
>> both Avro 1.8 and 1.9 and make Dataflow worker compatible with whatever the
>> user chooses. (I would expect Spark is trying to get to that point too?)
>> >>
>> >> So then if we have that can we achieve the goals? Spark runner users
>> that do not use Avro in their own code get Spark's version, Spark runner
>> users that do use Avro have to choose anyhow, and we make Dataflow worker
>> work with 1.8 and 1.9.
>> >>
>> >> We can probably achieve the same goals by just making the core
>> compatible with both 1.8 and 1.9. Users who don't want the dep can exclude
>> it, too. It doesn't have a bunch of transitive deps so there isn't a lot of
>> value in trying to exclude it though. So core vs extensions is more of a
>> clean engineering thing, but having compatibility with 1.9 is a user-driven
>> thing.
>> >>
>> >> Kenn
>> >>
>> >> On Fri, Sep 11, 2020 at 10:49 AM Ismaël Mejía <ieme...@gmail.com>
>> wrote:
>> >>>
>> >>> > The concern here is that Avro 1.9 is not backwards compatible with
>> Avro 1.8, so would the future world would not be a simple "bring your own
>> avro" but might require separate dataflow-with-avro-1.8 and
>> dataflow-with-avro-1.9 targets which certainly isn't scalable. (Or am I
>> mistaken here? Maybe we could solve this with vending?)
>> >>>
>> >>> Thinking a bit about it looks similar to what I mentioned with Spark
>> >>> runner save that we cannot control those targets so for that reason I
>> >>> talked about source code compatibility.
>> >>> Avro is really hard to shade correctly because of the way the code
>> >>> generation works, otherwise it could have been the best solution.
>> >>>
>> >>> On Fri, Sep 11, 2020 at 7:28 PM Robert Bradshaw <rober...@google.com>
>> wrote:
>> >>> >
>> >>> > On Fri, Sep 11, 2020 at 10:05 AM Kenneth Knowles <k...@apache.org>
>> wrote:
>> >>> >>
>> >>> >> Top-post: I'm generally in favor of moving Avro out of core
>> specifically because it is something where different users (and dep chains)
>> want different versions. The pain caused by having it in core has come up a
>> lot to me. I don't think backwards-compatibility absolutism helps our users
>> in this case. I do think gradual migration to ease pain is important.
>> >>> >
>> >>> >
>> >>> > Agree. Backwards compatibility is not the absolute goal; whatever
>> is best for existing and new users is what we should go for. That being
>> said, this whole issue is caused by one of our dependencies not being
>> backwards compatible itself...
>> >>> >
>> >>> >>
>> >>> >> On Fri, Sep 11, 2020 at 9:30 AM Robert Bradshaw <
>> rober...@google.com> wrote:
>> >>> >>>
>> >>> >>> On Thu, Sep 10, 2020 at 2:48 PM Brian Hulette <
>> bhule...@google.com> wrote:
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> On Tue, Sep 8, 2020 at 9:18 AM Robert Bradshaw <
>> rober...@google.com> wrote:
>> >>> >>>>>
>> >>> >>>>> IIRC Dataflow (and perhaps others) implicitly depend on Avro to
>> write
>> >>> >>>>> out intermediate files (e.g. for non-shuffle Fusion breaks).
>> Would
>> >>> >>>>> this break if we just removed it?
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> I think Dataflow would just need to declare a dependency on the
>> new extension.
>> >>> >>>
>> >>> >>>
>> >>> >>> I'm not sure this would solve the underlying problem (it just
>> pushes it onto users and makes it more obscure). Maybe my reasoning is
>> incorrect, but from what I see
>> >>> >>>
>> >>> >>> * Many Beam modules (e.g. dataflow, spark, file-based-io, sql,
>> kafka, parquet, ...) depend on Avro.
>> >>> >>> * Using Avro 1.9 with the above modules doesn't work.
>> >>> >>
>> >>> >>
>> >>> >> I suggest taking these on case-by-case.
>> >>> >>
>> >>> >>  - Dataflow: implementation detail, probably not a major problem
>> (we can just upgrade the pre-portability worker while for portability it is
>> a non-issue)
>> >>> >>  - Spark: probably need to use whatever version of Avro works for
>> each version of Spark (portability mitigates)
>> >>> >>  - SQL: happy to upgrade lib version, just needs to be able to
>> read the data, Avro version not user-facing
>> >>> >>  - IOs: I'm guessing that we have a diamond dep getting resolved
>> by clobbering. A quick glance seems like Parquet is on avro 1.10.0, Kafka's
>> Avro serde is a separate thing distributed by Confluent with Avro version
>> obfuscated by use of parent poms and properties, but their examples use
>> Avro 1.9.1.
>> >>> >
>> >>> >
>> >>> > The concern here is that Avro 1.9 is not backwards compatible with
>> Avro 1.8, so would the future world would not be a simple "bring your own
>> avro" but might require separate dataflow-with-avro-1.8 and
>> dataflow-with-avro-1.9 targets which certainly isn't scalable. (Or am I
>> mistaken here? Maybe we could solve this with vending?)
>> >>> >
>> >>> >>> Doesn't this mean that, even if we remove avro from Beam core, a
>> user that uses Beam + Avro 1.9 will have issues with any of the above
>> (fairly fundamental) modules?
>> >>> >>>
>> >>> >>>>  We could mitigate this by first adding the new extension module
>> and deprecating the core Beam counterpart for a release (or multiple
>> releases).
>> >>> >>>
>> >>> >>>
>> >>> >>> +1 to Reuven's concerns here.
>> >>> >>
>> >>> >>
>> >>> >> Agree we should add the module and release it for at least one
>> release, probably a few because users tend to hop a few releases. We have
>> some precedent for breaking changes with the Python/Flink version dropping
>> after asking users on user@ and polling on Twitter, etc.
>> >>> >>
>> >>> >> Kenn
>>
>

Reply via email to