I am surprised that we are claiming in the Beam website to use semantic
versioning (semver) [1] in Beam [2]. We have NEVER really followed semantic
versioning and we have broken multiple times both internal and external
APIs (at
least for Java) as you can find in this analysis of source and binary
compatibility between beam versions that I did for ‘sdks/java/core’ two
months
ago in the following link:

https://cloudflare-ipfs.com/ipfs/QmQSkWYmzerpUjT7fhE9CF7M9hm2uvJXNpXi58mS8RKcNi/

This report was produced by running the following script that excludes both
@Experimental and @Internal annotations as well as many internal packages
like
‘sdk/util/’, ‘transforms/reflect/’ and ‘sdk/testing/’ among others, for more
details on the exclusions refer to this script code:

https://gist.github.com/iemejia/5277fc269c63c4e49f1bb065454a895e

Respecting semantic versioning is REALLY HARD and a strong compromise that
may
bring both positive and negative impact to the project, as usual it is all
about
trade-offs. Semver requires tooling that we do not have yet in place to find
regressions before releases to fix them (or to augment major versions to
respect
the semver contract). We as a polyglot project need these tools for every
supported language, and since all our languages live in the same repository
and
are released simultaneously an incompatible change in one language may
trigger a
full new major version number for the whole project which does not look
like a
desirable outcome.

For these reasons I think we should soften the claim of using semantic
versioning claim and producing our own Beam semantic versioning policy that
is
consistent with our reality where we can also highlight the lack of
guarantees
for code marked as @Internal and @Experimental as well as for some modules
where
we may be interested on still having the freedom of not guaranteeing
stability
like runners/core* or any class in the different runners that is not a
PipelineOptions one.

In general whatever we decide we should probably not be as strict but
consider
in detail the tradeoffs of the policy. There is an ongoing discussion on
versioning in the Apache Spark community that is really worth the read and
proposes an analysis between Costs to break and API vs costs to maintain an
API
[3]. I think we can use it as an inspiration for an initial version.

WDYT?

[1] https://semver.org/
[2] https://beam.apache.org/get-started/downloads/
[3]
https://lists.apache.org/thread.html/r82f99ad8c2798629eed66d65f2cddc1ed196dddf82e8e9370f3b7d32%40%3Cdev.spark.apache.org%3E


On Thu, May 28, 2020 at 4:36 PM Reuven Lax <[email protected]> wrote:

> Most of those items are either in APIs marked @Experimental (the
> definition of Experimental in Beam is that we can make breaking changes to
> the API) or are changes in a specific runner - not the Beam API.
>
> Reuven
>
> On Thu, May 28, 2020 at 7:19 AM Ashwin Ramaswami <[email protected]>
> wrote:
>
>> There's a "Breaking Changes" section on this blogpost:
>> https://beam.apache.org/blog/beam-2.21.0/ (and really, for earlier minor
>> versions too)
>>
>> Ashwin Ramaswami
>> Student
>> *Find me on my:* LinkedIn <https://www.linkedin.com/in/ashwin-r> |
>> Website <https://epicfaace.github.io/> | GitHub
>> <https://github.com/epicfaace>
>>
>>
>> On Thu, May 28, 2020 at 10:01 AM Reuven Lax <[email protected]> wrote:
>>
>>> What did we break?
>>>
>>> On Thu, May 28, 2020, 6:31 AM Ashwin Ramaswami <[email protected]>
>>> wrote:
>>>
>>>> Do we really use semantic versioning? It appears we introduced breaking
>>>> changes from 2.20.0 -> 2.21.0. If not, we should update the documentation
>>>> under "API Stability" on this page:
>>>> https://beam.apache.org/get-started/downloads/
>>>>
>>>> What would be a better way to word the way in which we decide version
>>>> numbering?
>>>>
>>>

Reply via email to