Top-post: I'm generally in favor of moving Avro out of core specifically because it is something where different users (and dep chains) want different versions. The pain caused by having it in core has come up a lot to me. I don't think backwards-compatibility absolutism helps our users in this case. I do think gradual migration to ease pain is important.
On Fri, Sep 11, 2020 at 9:30 AM Robert Bradshaw <rober...@google.com> wrote: > On Thu, Sep 10, 2020 at 2:48 PM Brian Hulette <bhule...@google.com> wrote: > >> >> On Tue, Sep 8, 2020 at 9:18 AM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> IIRC Dataflow (and perhaps others) implicitly depend on Avro to write >>> out intermediate files (e.g. for non-shuffle Fusion breaks). Would >>> this break if we just removed it? >>> >> >> I think Dataflow would just need to declare a dependency on the new >> extension. >> > > I'm not sure this would solve the underlying problem (it just pushes it > onto users and makes it more obscure). Maybe my reasoning is incorrect, but > from what I see > > * Many Beam modules (e.g. dataflow, spark, file-based-io, sql, kafka, > parquet, ...) depend on Avro. > * Using Avro 1.9 with the above modules doesn't work. > I suggest taking these on case-by-case. - Dataflow: implementation detail, probably not a major problem (we can just upgrade the pre-portability worker while for portability it is a non-issue) - Spark: probably need to use whatever version of Avro works for each version of Spark (portability mitigates) - SQL: happy to upgrade lib version, just needs to be able to read the data, Avro version not user-facing - IOs: I'm guessing that we have a diamond dep getting resolved by clobbering. A quick glance seems like Parquet is on avro 1.10.0, Kafka's Avro serde is a separate thing distributed by Confluent with Avro version obfuscated by use of parent poms and properties, but their examples use Avro 1.9.1. Doesn't this mean that, even if we remove avro from Beam core, a user that > uses Beam + Avro 1.9 will have issues with any of the above (fairly > fundamental) modules? > > We could mitigate this by first adding the new extension module and >> deprecating the core Beam counterpart for a release (or multiple releases). >> > > +1 to Reuven's concerns here. > Agree we should add the module and release it for at least one release, probably a few because users tend to hop a few releases. We have some precedent for breaking changes with the Python/Flink version dropping after asking users on user@ and polling on Twitter, etc. Kenn