Also joining a bit late, I agree with Amit, HDFS improvements are a really
good thing to have before the stable release. I will also add the
IOChannelFactory refactorings to support things like Read.from(“hdfs://”)
aka BEAM-59.

In the worse case particular IOs can still be marked as experimental to
show users that they can still evolve, even after the first ‘stable’
release, the part that we have to pay more attention is not to break the
core SDK. And the question about Data Locality (BEAM-673) is where I am
afraid that we can have some breaking changes because there is not a way
from the IOs (Source/Sink) to send ‘a hint’ to the runner about Data
Locality (please correct me if I am wrong). And this even if not supported
in the first stable release by any runner, would be a really great thing to
have and I think this is a good moment to do it, to avoid breaking any
IO/runner signature because of new methods.

What do the others think ?
Ismaël



On Tue, Feb 28, 2017 at 6:29 PM, Amit Sela <amitsel...@gmail.com> wrote:

> Joining in just a bit late, I'll be quick and say that IMHO the SDK is
> mature enough and so my only point to add is *HDFS support*.
> I think that in terms of adoption we have to support HDFS as a "first-class
> citizen" via the FileSystem API, and provide data locality (batch) on top
> of it - it serves not only HDFS, but other eco-system IOs such as HBase.
> From my experience with talking to people and companies, most are running
> batch in production with some streaming POC or even production use, but
> batch still takes most of production work. If we give them the same
> production results, with the Beam API, we can on-board them faster and make
> it easier for them to adopt streaming as well.
>
> Thanks,
> Amit
>
> On Tue, Feb 28, 2017 at 7:12 PM Davor Bonaci <da...@apache.org> wrote:
>
> > Alright -- sounds like we have a consensus to proceed with the first
> stable
> > release after 0.6.0, targeting end of March / early April. I'll kick off
> > separate threads for specific decisions we need to make.
> >
> > On Thu, Feb 23, 2017 at 6:07 AM, Aljoscha Krettek <aljos...@apache.org>
> > wrote:
> >
> > > I think we're ready for this! The public APIs are in very good shape,
> > > especially now that we have the new DoFn, user facing state and timers
> > and
> > > splittable DoFn. Not all Runners support the more advanced features but
> > we
> > > can work on this after a stable release and there are enough runners
> that
> > > support a large part of the features.
> > >
> > > Best,
> > > Aljoscha
> > >
> > > On Thu, 23 Feb 2017 at 06:15 Kenneth Knowles <k...@google.com.invalid>
> > > wrote:
> > >
> > > > On Wed, Feb 22, 2017 at 5:35 PM, Chamikara Jayalath <
> > > chamik...@apache.org>
> > > > wrote:
> > > > >
> > > > > I think, this point applies to Python SDK as well (though as you
> > > > mentioned,
> > > > > API hiding in Python is a mere convention (prefix with underscore)
> > not
> > > > > enforced. We already have mechanism for marking APIs as deprecated
> > > which
> > > > > might be useful here:
> > > > > https://github.com/apache/beam/blob/master/sdks/python/
> > > > > apache_beam/utils/annotations.py
> > > > >
> > > > > - Cham
> > > > >
> > > >
> > > > Perhaps an explicit @public annotation would fit. I could imagine
> > easily
> > > > generating a spec to check against from such annotations, though
> > tooling
> > > is
> > > > secondary to documentation.
> > > >
> > > > Kenn
> > > >
> > >
> >
>

Reply via email to