Top-post: I'm generally in favor of moving Avro out of core specifically
because it is something where different users (and dep chains) want
different versions. The pain caused by having it in core has come up a lot
to me. I don't think backwards-compatibility absolutism helps our users in
this case. I do think gradual migration to ease pain is important.

On Fri, Sep 11, 2020 at 9:30 AM Robert Bradshaw <rober...@google.com> wrote:

> On Thu, Sep 10, 2020 at 2:48 PM Brian Hulette <bhule...@google.com> wrote:
>
>>
>> On Tue, Sep 8, 2020 at 9:18 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> IIRC Dataflow (and perhaps others) implicitly depend on Avro to write
>>> out intermediate files (e.g. for non-shuffle Fusion breaks). Would
>>> this break if we just removed it?
>>>
>>
>> I think Dataflow would just need to declare a dependency on the new
>> extension.
>>
>
> I'm not sure this would solve the underlying problem (it just pushes it
> onto users and makes it more obscure). Maybe my reasoning is incorrect, but
> from what I see
>
> * Many Beam modules (e.g. dataflow, spark, file-based-io, sql, kafka,
> parquet, ...) depend on Avro.
> * Using Avro 1.9 with the above modules doesn't work.
>

I suggest taking these on case-by-case.

 - Dataflow: implementation detail, probably not a major problem (we can
just upgrade the pre-portability worker while for portability it is a
non-issue)
 - Spark: probably need to use whatever version of Avro works for each
version of Spark (portability mitigates)
 - SQL: happy to upgrade lib version, just needs to be able to read the
data, Avro version not user-facing
 - IOs: I'm guessing that we have a diamond dep getting resolved by
clobbering. A quick glance seems like Parquet is on avro 1.10.0, Kafka's
Avro serde is a separate thing distributed by Confluent with Avro version
obfuscated by use of parent poms and properties, but their examples use
Avro 1.9.1.

Doesn't this mean that, even if we remove avro from Beam core, a user that
> uses Beam + Avro 1.9 will have issues with any of the above (fairly
> fundamental) modules?
>
>  We could mitigate this by first adding the new extension module and
>> deprecating the core Beam counterpart for a release (or multiple releases).
>>
>
> +1 to Reuven's concerns here.
>

Agree we should add the module and release it for at least one release,
probably a few because users tend to hop a few releases. We have some
precedent for breaking changes with the Python/Flink version dropping after
asking users on user@ and polling on Twitter, etc.

Kenn

Reply via email to