I like the idea of having a single PR for a features that touches different components (APIs, backends) and have multiple people contributing to it to make it work for all alternatives. This would ensure a synced code base, but it will take much more time to get new features in. This might be a problem if a feature is required for other features or asked for by some users.
I am not sure if the argument of increased workload towards a release is true. If the a feature should go into a release, it must be implemented for all APIs anyway. Maybe the chance that this is done at the end of a release cycle is even higher, if the feature is lingereing around in a PR and being available for a subset of the APIs. But who knows... Chesnay does also have a point here. We might want to distinguish between first-class APIs (backends) which are always in sync and others which might be a bit behind... 2014-09-29 9:56 GMT+02:00 Aljoscha Krettek <[email protected]>: > We could use blocking issues on Jira to mark things that need to be > resolved before a release. > > On Sat, Sep 27, 2014 at 11:53 PM, Chesnay Schepler < > [email protected]> wrote: > > > I agree with Kostas, and believe that postponing will imo straight up not > > work since people tend to be *very* busy close to a release, even without > > having to port features to several APIs. > > > > I furthermore don't think we will get anywhere by creating one policy to > > rule them all (especially a rigid one), because there are fundamental > > differences between a) the APIs b) scope of a feature; and there not > being > > a point in setting up a policy when it is very likely that we wont abide > by > > it. > > > > With the increasing number of API's it's quite a tall order expecting a > > version for each of them from a single contributor. Even know that would > be > > 3 (Java, Scala, Streaming(?)) with 2 more to come in the somewhat near > > future (Python, SQL (not sure if relevant)). It is a *massive *entry > > barrier, as well as a major time investment on the contributors part. > This > > should also hold for simple features (certainly at the beginning). > > > > If (and only if) Scala is as thin as i am made to believe i would be for > a > > hard policy here. I would exclude other API`s from this. The overhead > from > > getting to know all API's and debugging unfamiliar code would eat up way > to > > much time, which could easily break our neck. It's not just about syncing > > the API's, but doing so in an efficient manner. For them I would much > > rather have 2-3 people per API that are somewhat responsible for porting > > these features, preferably in a more concentrated effort (aka batches). > > > > > > On 27.9.2014 21:03, Kostas Tzoumas wrote: > > > >> If we allow out-of-sync APIs (and backends) until the time of a release, > >> aren't we just postponing the syncing problem to the time of the > release, > >> which is a pretty bad time to have such a problem? > >> > >> > >> On Fri, Sep 26, 2014 at 8:49 PM, Robert Metzger <[email protected]> > >> wrote: > >> > >> Hi, > >>> > >>> I'm also in favor of having a strict policy regarding the Java and > Scala > >>> API. > >>> In my understanding is the new Scala API a thin layer above the Java > one, > >>> so adding new methods should be straightforward (given that there are > >>> plenty of examples as a reference). > >>> > >>> Robert > >>> > >>> On Fri, Sep 26, 2014 at 11:04 AM, Ufuk Celebi <[email protected]> wrote: > >>> > >>> Hey Fabian, > >>>> > >>>> thanks for bringing this up. > >>>> > >>>> I would vote to have a hard policy regarding the Scala and Java API as > >>>> these are our main user facing APIs. > >>>> > >>>> If there was a fundamental problem or language feature, which could > not > >>>> > >>> be > >>> > >>>> supported/ported in/to the other API, I would be OK if it was only > >>>> available in one. But small additions to the APIs like outer joins, > >>>> which > >>>> can be in sync should also be in sync. > >>>> > >>>> If someone does not want to add the corresponding feature to the other > >>>> APIs, I would go for a pull request with a request for someone else to > >>>> > >>> port > >>> > >>>> the missing part it. > >>>> > >>>> I think it is very important for users to be able to assume that all > >>>> APIs > >>>> have the same "power". Otherwise we might end up in a situation (and I > >>>> think we already had it with the broadcast variables for a time), > where > >>>> users have to pick the API, which matches their use case and not their > >>>> preference. > >>>> > >>>> Best, > >>>> > >>>> Ufuk > >>>> > >>>> On 26 Sep 2014, at 10:43, Fabian Hueske <[email protected]> wrote: > >>>> > >>>> Hi, > >>>>> > >>>>> as you all know, Flink has a layered architecture with multiple > >>>>> alternatives for certain levels. > >>>>> Exampels are: > >>>>> - Programming APIs: Java, Scala, (and Python in progress) > >>>>> - Processing Backends: distributed runtime (former Nephele), Java > >>>>> Collections, (and potentially Tez in the future) > >>>>> > >>>>> The challenge with multiple alternatives that serve the same purpuse > is > >>>>> that these should be in sync. > >>>>> A feature that is added to the Java API should also be added to the > >>>>> > >>>> Scala > >>> > >>>> API (and other APIs in the future). The same applies to new runtime > >>>>> strategies and operators, such as outer joins. > >>>>> > >>>>> I think we need a policy how to keep the features of different layer > >>>>> alternatives in sync. > >>>>> With the recent update of the Scala API, a ScalaAPICompletenessTest > was > >>>>> added that checks whether the Scala API offers the same methods as > the > >>>>> > >>>> Java > >>>> > >>>>> API. Adding a feature to the Java API breaks the build and requires > to > >>>>> either adapt the Scala API as well or exclude the added methods from > >>>>> > >>>> the > >>> > >>>> APICompletenessTest. > >>>>> While this test is a great tool to make sure that that APIs are > synced, > >>>>> this basically requires that APIs are always synced, i.e., a > >>>>> > >>>> modification > >>> > >>>> of the Java API must go with an equivalent change of the Scala API. > >>>>> If we make this a tight policy and force compatibility at all times, > >>>>> contributors must know about several different technologies (Scala > >>>>> > >>>> Compiler > >>>> > >>>>> Macros, Python, the implementation details of multiple runtime > >>>>> > >>>> backends, > >>> > >>>> ...). This sounds like a huge entrance barrier to me. > >>>>> > >>>>> To make it clear, I am definitely in favor of keeping APIs and > backends > >>>>> > >>>> in > >>>> > >>>>> sync. > >>>>> However, I propose to enforce this only for releases, i.e., allow > >>>>> out-of-sync APIs on the master branch and fix the APIs for releases. > >>>>> With this additional requirement, we also need to think twice which > >>>>> features to add as multiple components of the system will be > affected. > >>>>> > >>>>> What do you guys think? > >>>>> > >>>> > >>>> > > >
