I agree with Kostas, and believe that postponing will imo straight up not work since people tend to be *very* busy close to a release, even without having to port features to several APIs.

I furthermore don't think we will get anywhere by creating one policy to rule them all (especially a rigid one), because there are fundamental differences between a) the APIs b) scope of a feature; and there not being a point in setting up a policy when it is very likely that we wont abide by it.

With the increasing number of API's it's quite a tall order expecting a version for each of them from a single contributor. Even know that would be 3 (Java, Scala, Streaming(?)) with 2 more to come in the somewhat near future (Python, SQL (not sure if relevant)). It is a *massive *entry barrier, as well as a major time investment on the contributors part. This should also hold for simple features (certainly at the beginning).

If (and only if) Scala is as thin as i am made to believe i would be for a hard policy here. I would exclude other API`s from this. The overhead from getting to know all API's and debugging unfamiliar code would eat up way to much time, which could easily break our neck. It's not just about syncing the API's, but doing so in an efficient manner. For them I would much rather have 2-3 people per API that are somewhat responsible for porting these features, preferably in a more concentrated effort (aka batches).

On 27.9.2014 21:03, Kostas Tzoumas wrote:
If we allow out-of-sync APIs (and backends) until the time of a release,
aren't we just postponing the syncing problem to the time of the release,
which is a pretty bad time to have such a problem?


On Fri, Sep 26, 2014 at 8:49 PM, Robert Metzger <[email protected]> wrote:

Hi,

I'm also in favor of having a strict policy regarding the Java and Scala
API.
In my understanding is the new Scala API a thin layer above the Java one,
so adding new methods should be straightforward (given that there are
plenty of examples as a reference).

Robert

On Fri, Sep 26, 2014 at 11:04 AM, Ufuk Celebi <[email protected]> wrote:

Hey Fabian,

thanks for bringing this up.

I would vote to have a hard policy regarding the Scala and Java API as
these are our main user facing APIs.

If there was a fundamental problem or language feature, which could not
be
supported/ported in/to the other API, I would be OK if it was only
available in one. But small additions to the APIs like outer joins, which
can be in sync should also be in sync.

If someone does not want to add the corresponding feature to the other
APIs, I would go for a pull request with a request for someone else to
port
the missing part it.

I think it is very important for users to be able to assume that all APIs
have the same "power". Otherwise we might end up in a situation (and I
think we already had it with the broadcast variables for a time), where
users have to pick the API, which matches their use case and not their
preference.

Best,

Ufuk

On 26 Sep 2014, at 10:43, Fabian Hueske <[email protected]> wrote:

Hi,

as you all know, Flink has a layered architecture with multiple
alternatives for certain levels.
Exampels are:
- Programming APIs: Java, Scala, (and Python in progress)
- Processing Backends: distributed runtime (former Nephele), Java
Collections, (and potentially Tez in the future)

The challenge with multiple alternatives that serve the same purpuse is
that these should be in sync.
A feature that is added to the Java API should also be added to the
Scala
API (and other APIs in the future). The same applies to new runtime
strategies and operators, such as outer joins.

I think we need a policy how to keep the features of different layer
alternatives in sync.
With the recent update of the Scala API, a ScalaAPICompletenessTest was
added that checks whether the Scala API offers the same methods as the
Java
API. Adding a feature to the Java API breaks the build and requires to
either adapt the Scala API as well or exclude the added methods from
the
APICompletenessTest.
While this test is a great tool to make sure that that APIs are synced,
this basically requires that APIs are always synced, i.e., a
modification
of the Java API must go with an equivalent change of the Scala API.
If we make this a tight policy and force compatibility at all times,
contributors must know about several different technologies (Scala
Compiler
Macros, Python, the implementation details of multiple runtime
backends,
...). This sounds like a huge entrance barrier to me.

To make it clear, I am definitely in favor of keeping APIs and backends
in
sync.
However, I propose to enforce this only for releases, i.e., allow
out-of-sync APIs on the master branch and fix the APIs for releases.
With this additional requirement, we also need to think twice which
features to add as multiple components of the system will be affected.

What do you guys think?


Reply via email to