Hi Neville,
Actually, we already planned to provide new SDK languages, including
Scala. So, a great move would be that you "donate" your Scala directly
in Beam. I don't say it has to happen now, but I would love to help you
to go in this direction. IMHO, if you don't want to give into the Beam
project, Beam itself will probably provide its own Scala SDK.
It's the same for the IO: we already plan to extend a lot the coverage
(HDFS, JMS, MQTT, ...). You can contribute in this area too !
Basically, I really encourage you to join the beam project: we love
contribution ;)
I would be happy to discuss and help you to get you involved in the
community.
Thanks !
Regards
JB
On 03/25/2016 08:36 AM, Neville Li wrote:
Thanks guys for the interest. I'm really excited about all the feedbacks
from the community.
A little background: we developed Scio to bring Google Cloud Dataflow
closer to the Scalding/Spark ecosystem that our developers are familiar
with while bringing some missing pieces to the table (type safe BigQuery,
HDFS, REPL to name a few).
I have to admit that I'm pretty new to the BEAM development but would love
to get feedbacks and advices on how to bring Scio closer to BEAM feature
set and semantics. Scio doesn't have to live with the BEAM code base just
yet (we're still under heavy development) but I'd like to see it as a de
facto Scala API endorsed by the BEAM community.
@Ismaël: I'm curious what's this session thing you're referring to?
On Thu, Mar 24, 2016 at 3:40 PM Frances Perry <[email protected]>
wrote:
+Neville and Rafal for their take ;-)
Excited to see this out. Multiple community driven SDKs are right in line
with our goals for Beam.
On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <[email protected]> wrote:
Addendum: actually the semantic model support is not so far away as I
said
before (I havent finished reading and I thought they didn't support
sessions), and looking at the git history the project is not so young
either and it is quite active.
On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <[email protected]>
wrote:
Hello,
I just checked a bit the code and what they have done is interesting,
the
SCollection wrapper is worth a look, as well as the examples to get an
idea
of their intentions, the fact that the code looks so spark-lish
(distributed collections like) is something that is quite interesting
too:
val (sc, args) = ContextAndArgs(cmdlineArgs)
sc.textFile(args.getOrElse("input", ExampleData.KING_LEAR))
.flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
.countByValue()
.map(t => t._1 + ": " + t._2)
.saveAsTextFile(args("output"))
sc.close()
They have a repl, and since the project is a bit young they don't
support
all the advanced semantics of Beam, They also have a Hadoop File
Sink/Source. I think it would be nice to work with them, but if it is
not
possible, at least I think it is worth to coordinate some sharing e.g.
in
the Sink/Source area + other extensions.
Aditionally their code is also under the Apache license.
On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste Onofré <[email protected]
wrote:
Hi Raghu,
I agree: we should provide SDK in different languages, and DSLs for
specific use cases.
You got why I sent my proposal ;)
Regards
JB
On 03/24/2016 07:14 PM, Raghu Angadi wrote:
I would love to see Scala API properly supported. I didn't know about
scio.
Scala is such a natural fit for Dataflow API.
I am not sure of the policy w.r.t where such packages would live in
Beam
repo, but I personally would write my Dataflow applications in Scala.
It
is
probably already the case but my request would be : it should be as
thin
as
reasonably possible (that might make it a bit less like
scalding/spark
API
in some cases, which I think is a good compromise).
On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste Onofré <
[email protected]
wrote:
Hi beamers,
right now, Beam provides Java SDK.
AFAIK, very soon, you should have the Python SDK ;)
Spotify created a Scala API on top of Google Dataflow SDK:
https://github.com/spotify/scio
What do you think of asking if they want to donate this as Beam
Scala
SDK ?
I planned to work on a Scala SDK, but as it seems there's already
something, it makes sense to leverage it.
Thoughts ?
Regards
JB
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com