Just to summarize, at this point: - Everybody agrees about the fact that scio is not an SDK. - Almost everybody agrees that given the current choice they would prefer ‘dsls/scio’ - Some of us are not particularly married with the DSL classification.
I have a proposition to make, we can define two concepts with their given structure in the Beam repository: 1. Beam API: A set of abstractions to program the complete Beam Model in a given programming language. These are idiomatic versions of the Beam Model, and ideally should cover the complete Beam Model e.g. scio is one example. The directory structure for Beam APIs could be: apis/scala apis/clojure apis/groovy ... 2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g. graphs, machine learning, etc These represent domain specific idioms, e.g. a graph DSL would represent graph concepts. e.g. edges, vertex, etc as first citizens. The directory structure for Beam DSLs could be: dsls/graph dsls/ml dsls/cep ... Given these definitions for the concrete scio case I think the most accurate directory would be: apis/scala or apis/scala/scio I personally prefer the first one (apis/scala) because we don’t have any other scala API for the moment and because I think that we shouldn’t have more than one API per language to avoid confusion e.g. imagine that someone creates apis/java/bcollections to represent Beam Pipelines as distributed collections, that would be confusing. However I understand the arguments for the second directory e.g. to support different APIs per language, and to preserve their original names (scio). Anyway I would be ok with any of the two. I excuse myself for this long message, and for not choosing any of the two structures proposed in this thread, but I think it is important to be clear about the differences in scope of both Beam APIs and DSLs in particular if we think about new users. What do you think, do you think my proposition makes sense, any suggestions ? Regards, Ismaël ps. One last thing, I found this text that in part corroborates my feeling about scio been an API and not a DSL: “… a Scala Dataflow API (a nascent open-source version of which already exists, and which seems likely to flower into maturity in due time given Dataflow's move to join the ASF).” https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi <[email protected]> wrote: > On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin <[email protected] > > > wrote: > > > > I love the > > > name scio. But I think sdks/scala might be most appropriate and would > > make > > > it a first class citizen for Beam. > > > > > > > I am strongly against it being in the 'sdks/' top-level module -- it's > not > > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam. > > > > +1. I agree, it is not Beam SDK in that sense. > > Raghu. > > > > > > > Where would a future python sdk reside? > > > > > > > The Python SDK is in the python-sdk branch on Apache already, and it > lives > > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;) >
