Just to summarize, at this point:

- Everybody agrees about the fact that scio is not an SDK.
- Almost everybody agrees that given the current choice they would prefer
‘dsls/scio’
- Some of us are not particularly married with the DSL classification.

I have a proposition to make, we can define two concepts with their given
structure in the Beam repository:

1. Beam API: A set of abstractions to program the complete Beam Model in a
given programming language.

These are idiomatic versions of the Beam Model, and ideally should cover
the complete Beam Model e.g. scio is one example. The directory structure
for Beam APIs could be:

apis/scala
apis/clojure
apis/groovy
...

2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g.
graphs, machine learning, etc

These represent domain specific idioms, e.g. a graph DSL would represent
graph concepts. e.g. edges, vertex, etc as first citizens. The directory
structure for Beam DSLs could be:

dsls/graph
dsls/ml
dsls/cep
...

Given these definitions for the concrete scio case I think the most
accurate directory would be:

apis/scala
or
apis/scala/scio

I personally prefer the first one (apis/scala) because we don’t have any
other scala API for the moment and because I think that we shouldn’t have
more than one API per language to avoid confusion e.g. imagine that someone
creates apis/java/bcollections to represent Beam Pipelines as distributed
collections, that would be confusing. However I understand the arguments
for the second directory e.g. to support different APIs per language, and
to preserve their original names (scio). Anyway I would be ok with any of
the two.

I excuse myself for this long message, and for not choosing any of the two
structures proposed in this thread, but I think it is important to be clear
about the differences in scope of both Beam APIs and DSLs in particular if
we think about new users.

What do you think, do you think my proposition makes sense, any suggestions
?

Regards,
Ismaël

ps. One last thing, I found this text that in part corroborates my feeling
about scio been an API and not a DSL:

“… a Scala Dataflow API (a nascent open-source version of which already
exists, and which seems likely to flower into maturity in due time given
Dataflow's move to join the ASF).”
https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison


On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi <[email protected]>
wrote:

> On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin <[email protected]
> >
> wrote:
>
> > > I love the
> > > name scio. But I think sdks/scala might be most appropriate and would
> > make
> > > it a first class citizen for Beam.
> > >
> >
> > I am strongly against it being in the 'sdks/' top-level module -- it's
> not
> > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam.
> >
>
> +1. I agree, it is not Beam SDK in that sense.
>
> Raghu.
>
>
> >
> > > Where would a future python sdk reside?
> > >
> >
> > The Python SDK is in the python-sdk branch on Apache already, and it
> lives
> > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;)
>

Reply via email to