I don't think that sdks/scala is the right place -- scio is not a Beam Scala SDK; it wraps the existing Java SDK.
Some options: * sdks/java/extensions (Scio builds on the Java SDK) -- mentally vetoed since Scio isn't an extension for the Java SDK, but rather a wrapper * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK) * dsls/scio (Scio is a Beam DSL that could eventually use multiple SDKs) * extensions/java/scio (Scio is an extension of Beam that uses the Java SDK) * extensions/scio (Scio is an extension of Beam that is not limited to one SDK) I lean towards either dsls/java/scio or extensions/java/scio, since I don't think there are plans for Scio to handle multiple different SDKs (in different languages). The question between these two is whether we think DSLs are "big enough" to be a top level concept. On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré <[email protected]> wrote: > Good point about new Fn and the fact it's based on the Java SDK. > > It's just that in term of "marketing", it's a good message to provide a > Scala SDK even if technically it's more a DSL. > > For instance, a valid "marketing" DSL would be a Java fluent DSL on top of > the Java SDK, or a declarative XML DSL. > > However, from a technical perspective, it can go into dsl module. > > My $0.02 ;) > > Regards > JB > > > On 06/24/2016 06:51 AM, Frances Perry wrote: > >> +Rafal & Andrew again >> >> I am leaning DSL for two reasons: (1) scio uses the existing java >> execution >> environment (and won't have a language-specific fn harness of its own), >> and >> (2) it changes the abstractions that users interact with. >> >> I recently saw a scio repl demo from Reuven -- there's some really cool >> stuff in there. I'd love to dive into it a bit more and see what can be >> generalized beyond scio. The repl-like interactive graph construction is >> very similar to what we've seen with ipython, in that it doesn't always >> play nicely with the graph construction / graph execution distinction. I >> wonder what changes to Beam might more generally support this. The >> materialize stuff looks similar to some functionality in FlumeJava we used >> to support multi-segment pipelines with some shared intermediate >> PCollections. >> >> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré <[email protected]> >> wrote: >> >> Hi Neville, >>> >>> thanks for the update ! >>> >>> As it's another language support, and to clearly identify the purpose, I >>> would say sdks/scala. >>> >>> Regards >>> JB >>> >>> >>> On 06/23/2016 11:56 PM, Neville Li wrote: >>> >>> +folks in my team >>>> >>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li <[email protected]> >>>> wrote: >>>> >>>> Hi all, >>>> >>>>> >>>>> I'm the co-author of Scio <https://github.com/spotify/scio> and am in >>>>> the >>>>> progress of moving code to Beam (BEAM-302 >>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just wondering if >>>>> sdks/scala is the right place for this code or if something like >>>>> dsls/scio >>>>> is a better choice? What do you think? >>>>> >>>>> A little background: Scio was built as a high-level Scala API for >>>>> Google >>>>> Cloud Dataflow (now also Apache Beam) and is heavily influenced by >>>>> Spark >>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while also >>>>> providing features comparable to other Scala data frameworks. We use >>>>> Scio >>>>> on Dataflow for production extensively inside Spotify. >>>>> >>>>> Cheers, >>>>> Neville >>>>> >>>>> >>>>> >>>> -- >>> Jean-Baptiste Onofré >>> [email protected] >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >>> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
