Re: [PROPOSAL] New sdk languages

Ismaël Mejía Fri, 25 Mar 2016 00:56:24 -0700

Hello Neville,

First congratulations guys, excellent job / API, the scalding touches are
pretty neat (as well as the Tap abstraction). I am also new to Beam, so
believe me, you guys already know more than me.


In my comment I mentioned sessions referring to session windows, but it was
my mistake since I just took a fast look at your code and initially didn't
see them. Anyway if you are interested in the model there is a good
description of the current capabilities of the runners in the website,

https://beam.incubator.apache.org/capability-matrix/

And the new additions to the model are openly discussed in the mailing list
and in the technical docs (e.g. lateness):

https://goo.gl/ps8twC

-Ismaël

On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <[email protected]> wrote:

> Thanks guys for the interest. I'm really excited about all the feedbacks
> from the community.
>
> A little background: we developed Scio to bring Google Cloud Dataflow
> closer to the Scalding/Spark ecosystem that our developers are familiar
> with while bringing some missing pieces to the table (type safe BigQuery,
> HDFS, REPL to name a few).
>
> I have to admit that I'm pretty new to the BEAM development but would love
> to get feedbacks and advices on how to bring Scio closer to BEAM feature
> set and semantics. Scio doesn't have to live with the BEAM code base just
> yet (we're still under heavy development) but I'd like to see it as a de
> facto Scala API endorsed by the BEAM community.
>
> @Ismaël: I'm curious what's this session thing you're referring to?
>
> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <[email protected]>
> wrote:
>
> > +Neville and Rafal for their take ;-)
> >
> > Excited to see this out. Multiple community driven SDKs are right in line
> > with our goals for Beam.
> >
> >
> > On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <[email protected]> wrote:
> >
> > > Addendum: actually the semantic model support is not so far away as I
> > said
> > > before (I havent finished reading and I thought they didn't support
> > > sessions), and looking at the git history the project is not so young
> > > either and it is quite active.
> > >
> > > On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <[email protected]>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > I just checked a bit the code and what they have done is interesting,
> > the
> > > > SCollection wrapper is worth a look, as well as the examples to get
> an
> > > idea
> > > > of their intentions, the fact that the code looks so spark-lish
> > > > (distributed collections like) is something that is quite interesting
> > > too:
> > > >
> > > >     val (sc, args) = ContextAndArgs(cmdlineArgs)
> > > >     sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
> > > >       .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
> > > >       .countByValue()
> > > >       .map(t => t._1 + ": " + t._2)
> > > >       .saveAsTextFile(args("output"))
> > > >     sc.close()
> > > >
> > > > They have a repl, and since the project is a bit young they don't
> > support
> > > > all the advanced semantics of Beam, They also have a Hadoop File
> > > > Sink/Source. I think it would be nice to work with them, but if it is
> > not
> > > > possible, at least I think it is worth to coordinate some sharing
> e.g.
> > in
> > > > the Sink/Source area + other extensions.
> > > >
> > > > Aditionally their code is also under the Apache license.
> > > >
> > > >
> > > > On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > >> Hi Raghu,
> > > >>
> > > >> I agree: we should provide SDK in different languages, and DSLs for
> > > >> specific use cases.
> > > >>
> > > >> You got why I sent my proposal  ;)
> > > >>
> > > >> Regards
> > > >> JB
> > > >>
> > > >>
> > > >> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> > > >>
> > > >>> I would love to see Scala API properly supported. I didn't know
> about
> > > >>> scio.
> > > >>> Scala is such a natural fit for Dataflow API.
> > > >>>
> > > >>> I am not sure of the policy w.r.t where such packages would live in
> > > Beam
> > > >>> repo, but I personally would write my Dataflow applications in
> Scala.
> > > It
> > > >>> is
> > > >>> probably already the case but my request would be : it should be as
> > > thin
> > > >>> as
> > > >>> reasonably possible (that might make it a bit less like
> > scalding/spark
> > > >>> API
> > > >>> in some cases, which I think is a good compromise).
> > > >>>
> > > >>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
> > [email protected]
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> Hi beamers,
> > > >>>>
> > > >>>> right now, Beam provides Java SDK.
> > > >>>>
> > > >>>> AFAIK, very soon, you should have the Python SDK ;)
> > > >>>>
> > > >>>> Spotify created a Scala API on top of Google Dataflow SDK:
> > > >>>>
> > > >>>> https://github.com/spotify/scio
> > > >>>>
> > > >>>> What do you think of asking if they want to donate this as Beam
> > Scala
> > > >>>> SDK ?
> > > >>>> I planned to work on a Scala SDK, but as it seems there's already
> > > >>>> something, it makes sense to leverage it.
> > > >>>>
> > > >>>> Thoughts ?
> > > >>>>
> > > >>>> Regards
> > > >>>> JB
> > > >>>> --
> > > >>>> Jean-Baptiste Onofré
> > > >>>> [email protected]
> > > >>>> http://blog.nanthrax.net
> > > >>>> Talend - http://www.talend.com
> > > >>>>
> > > >>>>
> > > >>>
> > > >> --
> > > >> Jean-Baptiste Onofré
> > > >> [email protected]
> > > >> http://blog.nanthrax.net
> > > >> Talend - http://www.talend.com
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [PROPOSAL] New sdk languages

Reply via email to