Looks like dsls/scio is the winner :) I like it too plus we get to keep the Scio name. This also leaves room for other Scala wrappers of different flavor. Scio is a DSL in the domain of functional style data pipelines.
On Mon, Jun 27, 2016 at 3:55 AM Ismaël Mejía <[email protected]> wrote: > Just to summarize, at this point: > > - Everybody agrees about the fact that scio is not an SDK. > - Almost everybody agrees that given the current choice they would prefer > ‘dsls/scio’ > - Some of us are not particularly married with the DSL classification. > > I have a proposition to make, we can define two concepts with their given > structure in the Beam repository: > > 1. Beam API: A set of abstractions to program the complete Beam Model in a > given programming language. > > These are idiomatic versions of the Beam Model, and ideally should cover > the complete Beam Model e.g. scio is one example. The directory structure > for Beam APIs could be: > > apis/scala > apis/clojure > apis/groovy > ... > > 2. Beam DSL: A domain-specific set of abstractions that run on Beam, e.g. > graphs, machine learning, etc > > These represent domain specific idioms, e.g. a graph DSL would represent > graph concepts. e.g. edges, vertex, etc as first citizens. The directory > structure for Beam DSLs could be: > > dsls/graph > dsls/ml > dsls/cep > ... > > Given these definitions for the concrete scio case I think the most > accurate directory would be: > > apis/scala > or > apis/scala/scio > > I personally prefer the first one (apis/scala) because we don’t have any > other scala API for the moment and because I think that we shouldn’t have > more than one API per language to avoid confusion e.g. imagine that someone > creates apis/java/bcollections to represent Beam Pipelines as distributed > collections, that would be confusing. However I understand the arguments > for the second directory e.g. to support different APIs per language, and > to preserve their original names (scio). Anyway I would be ok with any of > the two. > > I excuse myself for this long message, and for not choosing any of the two > structures proposed in this thread, but I think it is important to be clear > about the differences in scope of both Beam APIs and DSLs in particular if > we think about new users. > > What do you think, do you think my proposition makes sense, any suggestions > ? > > Regards, > Ismaël > > ps. One last thing, I found this text that in part corroborates my feeling > about scio been an API and not a DSL: > > “… a Scala Dataflow API (a nascent open-source version of which already > exists, and which seems likely to flower into maturity in due time given > Dataflow's move to join the ASF).” > https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison > > > On Mon, Jun 27, 2016 at 4:52 AM, Raghu Angadi <[email protected]> > wrote: > > > On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin > <[email protected] > > > > > wrote: > > > > > > I love the > > > > name scio. But I think sdks/scala might be most appropriate and would > > > make > > > > it a first class citizen for Beam. > > > > > > > > > > I am strongly against it being in the 'sdks/' top-level module -- it's > > not > > > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam. > > > > > > > +1. I agree, it is not Beam SDK in that sense. > > > > Raghu. > > > > > > > > > > > Where would a future python sdk reside? > > > > > > > > > > The Python SDK is in the python-sdk branch on Apache already, and it > > lives > > > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;) > > >
