Hello Neville, First congratulations guys, excellent job / API, the scalding touches are pretty neat (as well as the Tap abstraction). I am also new to Beam, so believe me, you guys already know more than me.
In my comment I mentioned sessions referring to session windows, but it was my mistake since I just took a fast look at your code and initially didn't see them. Anyway if you are interested in the model there is a good description of the current capabilities of the runners in the website, https://beam.incubator.apache.org/capability-matrix/ And the new additions to the model are openly discussed in the mailing list and in the technical docs (e.g. lateness): https://goo.gl/ps8twC -Ismaël On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <[email protected]> wrote: > Thanks guys for the interest. I'm really excited about all the feedbacks > from the community. > > A little background: we developed Scio to bring Google Cloud Dataflow > closer to the Scalding/Spark ecosystem that our developers are familiar > with while bringing some missing pieces to the table (type safe BigQuery, > HDFS, REPL to name a few). > > I have to admit that I'm pretty new to the BEAM development but would love > to get feedbacks and advices on how to bring Scio closer to BEAM feature > set and semantics. Scio doesn't have to live with the BEAM code base just > yet (we're still under heavy development) but I'd like to see it as a de > facto Scala API endorsed by the BEAM community. > > @Ismaël: I'm curious what's this session thing you're referring to? > > On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <[email protected]> > wrote: > > > +Neville and Rafal for their take ;-) > > > > Excited to see this out. Multiple community driven SDKs are right in line > > with our goals for Beam. > > > > > > On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <[email protected]> wrote: > > > > > Addendum: actually the semantic model support is not so far away as I > > said > > > before (I havent finished reading and I thought they didn't support > > > sessions), and looking at the git history the project is not so young > > > either and it is quite active. > > > > > > On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <[email protected]> > > wrote: > > > > > > > Hello, > > > > > > > > I just checked a bit the code and what they have done is interesting, > > the > > > > SCollection wrapper is worth a look, as well as the examples to get > an > > > idea > > > > of their intentions, the fact that the code looks so spark-lish > > > > (distributed collections like) is something that is quite interesting > > > too: > > > > > > > > val (sc, args) = ContextAndArgs(cmdlineArgs) > > > > sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR)) > > > > .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty)) > > > > .countByValue() > > > > .map(t => t._1 + ": " + t._2) > > > > .saveAsTextFile(args("output")) > > > > sc.close() > > > > > > > > They have a repl, and since the project is a bit young they don't > > support > > > > all the advanced semantics of Beam, They also have a Hadoop File > > > > Sink/Source. I think it would be nice to work with them, but if it is > > not > > > > possible, at least I think it is worth to coordinate some sharing > e.g. > > in > > > > the Sink/Source area + other extensions. > > > > > > > > Aditionally their code is also under the Apache license. > > > > > > > > > > > > On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré < > [email protected] > > > > > > > wrote: > > > > > > > >> Hi Raghu, > > > >> > > > >> I agree: we should provide SDK in different languages, and DSLs for > > > >> specific use cases. > > > >> > > > >> You got why I sent my proposal ;) > > > >> > > > >> Regards > > > >> JB > > > >> > > > >> > > > >> On 03/24/2016 07:14 PM, Raghu Angadi wrote: > > > >> > > > >>> I would love to see Scala API properly supported. I didn't know > about > > > >>> scio. > > > >>> Scala is such a natural fit for Dataflow API. > > > >>> > > > >>> I am not sure of the policy w.r.t where such packages would live in > > > Beam > > > >>> repo, but I personally would write my Dataflow applications in > Scala. > > > It > > > >>> is > > > >>> probably already the case but my request would be : it should be as > > > thin > > > >>> as > > > >>> reasonably possible (that might make it a bit less like > > scalding/spark > > > >>> API > > > >>> in some cases, which I think is a good compromise). > > > >>> > > > >>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré < > > [email protected] > > > > > > > >>> wrote: > > > >>> > > > >>> Hi beamers, > > > >>>> > > > >>>> right now, Beam provides Java SDK. > > > >>>> > > > >>>> AFAIK, very soon, you should have the Python SDK ;) > > > >>>> > > > >>>> Spotify created a Scala API on top of Google Dataflow SDK: > > > >>>> > > > >>>> https://github.com/spotify/scio > > > >>>> > > > >>>> What do you think of asking if they want to donate this as Beam > > Scala > > > >>>> SDK ? > > > >>>> I planned to work on a Scala SDK, but as it seems there's already > > > >>>> something, it makes sense to leverage it. > > > >>>> > > > >>>> Thoughts ? > > > >>>> > > > >>>> Regards > > > >>>> JB > > > >>>> -- > > > >>>> Jean-Baptiste Onofré > > > >>>> [email protected] > > > >>>> http://blog.nanthrax.net > > > >>>> Talend - http://www.talend.com > > > >>>> > > > >>>> > > > >>> > > > >> -- > > > >> Jean-Baptiste Onofré > > > >> [email protected] > > > >> http://blog.nanthrax.net > > > >> Talend - http://www.talend.com > > > >> > > > > > > > > > > > > > >
