Just looked at some Scio examples - and saw Spark Scala code ;-) For me, this made some sense - Spark is written in Scala (let's call it Scala SDK ?) but it also provides Java API. New version has a unified API (Java-Scala interop.) So I see Scio in a similar way, It's Scala API because it's built on top of the Java SDK. Having said that, Scio could offer more than just Scala API over the Java SDK (i.e., repl) so in the lack of a native fit, I'd go with DSL. And to relate to the very valid notes people had about saying "Hi, we support Scala!", we can call it Scala API, even if it's under dsls/scio.
So +1 for dsls/scio Thanks, Amit On Sat, Jun 25, 2016 at 5:06 AM Dan Halperin <[email protected]> wrote: > On Fri, Jun 24, 2016 at 7:05 PM, Dan Halperin <[email protected]> wrote: > > > On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi <[email protected] > > > > wrote: > > > >> DSL is a pretty generic term.. > >> > > > > I agree and am not married to it. Neville? > > > > > >> The fact that scio uses Java SDK is an implementation detail. > > > > > > Reasonable, which is why I am also not pushing hard for '/java/scio' to > be > > in the path. > > > > > >> I love the > >> name scio. But I think sdks/scala might be most appropriate and would > make > >> it a first class citizen for Beam. > >> > > > > I am strongly against it being in the 'sdks/' top-level module -- it's > not > > a Beam SDK. Unlike DSL, SDK is a very specific term in Beam. > > > > > >> Where would a future python sdk reside? > >> > > > > The Python SDK is in the python-sdk branch on Apache already, and it > lives > > in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;) > > > > Now with a link: > https://github.com/apache/incubator-beam/tree/python-sdk/sdks > > > > > Thanks, > > Dan > > > > On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré <[email protected]> > >> wrote: > >> > >> > Agree for dsls/scio > >> > > >> > Regards > >> > JB > >> > > >> > > >> > On 06/24/2016 10:22 PM, Lukasz Cwik wrote: > >> > > >> >> +1 for dsls/scio for the already listed reasons > >> >> > >> >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla > >> <[email protected]> > >> >> wrote: > >> >> > >> >> Hello. When it comes to SDK vs DSL - I fully agree with Frances. > About > >> >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, > scio > >> >>> is a > >> >>> scala DSL but lives under java directory (?) - that makes sense only > >> once > >> >>> you get that scio is using java SDK under the hood. Thus, +1 to > >> >>> dsls/scio. > >> >>> - Rafal > >> >>> > >> >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles > >> <[email protected] > >> >>> > > >> >>> wrote: > >> >>> > >> >>> My +1 goes to dsls/scio. It already has a cool name, so let's use > it. > >> And > >> >>>> there might be other Scala-based DSLs. > >> >>>> > >> >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <[email protected]> > >> >>>> wrote: > >> >>>> > >> >>>> Hello everyone, > >> >>>>> > >> >>>>> Neville, thanks a lot for your contribution. Your work is amazing > >> and I > >> >>>>> > >> >>>> am > >> >>>> > >> >>>>> really happy that this scala integration is finally happening. > >> >>>>> Congratulations to you and your team. > >> >>>>> > >> >>>>> I *strongly* disagree about the DSL classification for scio for > one > >> >>>>> > >> >>>> reason, > >> >>>> > >> >>>>> if you go to the root of the term, Domain Specific Languages are > >> about > >> >>>>> > >> >>>> a > >> >>> > >> >>>> domain, and the domain in this case is writing Beam pipelines, > which > >> >>>>> > >> >>>> is a > >> >>> > >> >>>> really broad domain. > >> >>>>> > >> >>>>> I agree with Frances’ argument that scio is not an SDK e.g. it > >> reuses > >> >>>>> > >> >>>> the > >> >>> > >> >>>> existing Beam java SDK. My proposition is that scio will be called > >> the > >> >>>>> Scala API because in the end this is what it is. I think the > >> confusion > >> >>>>> comes from the common definition of SDK which is normally an API > + a > >> >>>>> Runtime. In this case scio will share the runtime with what we > call > >> the > >> >>>>> Beam Java SDK. > >> >>>>> > >> >>>>> One additional point of using the term API is that it sends the > >> clear > >> >>>>> message that Beam has a Scala API too (which is good for > visibility > >> as > >> >>>>> > >> >>>> JB > >> >>> > >> >>>> mentioned). > >> >>>>> > >> >>>>> Regards, > >> >>>>> Ismaël > >> >>>>> > >> >>>>> > >> >>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré < > >> [email protected] > >> >>>>> > >> >>>> > >> >>>> wrote: > >> >>>>> > >> >>>>> Hi Dan, > >> >>>>>> > >> >>>>>> fair enough. > >> >>>>>> > >> >>>>>> As I'm also working on new DSLs (XML, JSON), I already created > the > >> >>>>>> > >> >>>>> dsls > >> >>> > >> >>>> module. > >> >>>>>> > >> >>>>>> So, I would say dsls/scala. > >> >>>>>> > >> >>>>>> WDYT ? > >> >>>>>> > >> >>>>>> Regards > >> >>>>>> JB > >> >>>>>> > >> >>>>>> > >> >>>>>> On 06/24/2016 05:07 PM, Dan Halperin wrote: > >> >>>>>> > >> >>>>>> I don't think that sdks/scala is the right place -- scio is not a > >> >>>>>>> > >> >>>>>> Beam > >> >>> > >> >>>> Scala SDK; it wraps the existing Java SDK. > >> >>>>>>> > >> >>>>>>> Some options: > >> >>>>>>> * sdks/java/extensions (Scio builds on the Java SDK) -- > mentally > >> >>>>>>> > >> >>>>>> vetoed > >> >>>> > >> >>>>> since Scio isn't an extension for the Java SDK, but rather a > wrapper > >> >>>>>>> > >> >>>>>>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK) > >> >>>>>>> * dsls/scio (Scio is a Beam DSL that could eventually use > >> multiple > >> >>>>>>> > >> >>>>>> SDKs) > >> >>>>> > >> >>>>>> * extensions/java/scio (Scio is an extension of Beam that uses > the > >> >>>>>>> > >> >>>>>> Java > >> >>>> > >> >>>>> SDK) > >> >>>>>>> * extensions/scio (Scio is an extension of Beam that is not > >> limited > >> >>>>>>> > >> >>>>>> to > >> >>>> > >> >>>>> one > >> >>>>>>> SDK) > >> >>>>>>> > >> >>>>>>> I lean towards either dsls/java/scio or extensions/java/scio, > >> since > >> >>>>>>> > >> >>>>>> I > >> >>> > >> >>>> don't > >> >>>>>>> think there are plans for Scio to handle multiple different SDKs > >> (in > >> >>>>>>> different languages). The question between these two is whether > we > >> >>>>>>> > >> >>>>>> think > >> >>>> > >> >>>>> DSLs are "big enough" to be a top level concept. > >> >>>>>>> > >> >>>>>>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré < > >> >>>>>>> > >> >>>>>> [email protected] > >> >>>> > >> >>>>> > >> >>>>>> wrote: > >> >>>>>>> > >> >>>>>>> Good point about new Fn and the fact it's based on the Java SDK. > >> >>>>>>> > >> >>>>>>>> > >> >>>>>>>> It's just that in term of "marketing", it's a good message to > >> >>>>>>>> > >> >>>>>>> provide a > >> >>>> > >> >>>>> Scala SDK even if technically it's more a DSL. > >> >>>>>>>> > >> >>>>>>>> For instance, a valid "marketing" DSL would be a Java fluent > DSL > >> on > >> >>>>>>>> > >> >>>>>>> top > >> >>>> > >> >>>>> of > >> >>>>>>>> the Java SDK, or a declarative XML DSL. > >> >>>>>>>> > >> >>>>>>>> However, from a technical perspective, it can go into dsl > module. > >> >>>>>>>> > >> >>>>>>>> My $0.02 ;) > >> >>>>>>>> > >> >>>>>>>> Regards > >> >>>>>>>> JB > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> On 06/24/2016 06:51 AM, Frances Perry wrote: > >> >>>>>>>> > >> >>>>>>>> +Rafal & Andrew again > >> >>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>> I am leaning DSL for two reasons: (1) scio uses the existing > >> java > >> >>>>>>>>> execution > >> >>>>>>>>> environment (and won't have a language-specific fn harness of > >> its > >> >>>>>>>>> > >> >>>>>>>> own), > >> >>>>> > >> >>>>>> and > >> >>>>>>>>> (2) it changes the abstractions that users interact with. > >> >>>>>>>>> > >> >>>>>>>>> I recently saw a scio repl demo from Reuven -- there's some > >> really > >> >>>>>>>>> > >> >>>>>>>> cool > >> >>>>> > >> >>>>>> stuff in there. I'd love to dive into it a bit more and see what > >> >>>>>>>>> > >> >>>>>>>> can > >> >>> > >> >>>> be > >> >>>>> > >> >>>>>> generalized beyond scio. The repl-like interactive graph > >> >>>>>>>>> > >> >>>>>>>> construction > >> >>>> > >> >>>>> is > >> >>>>> > >> >>>>>> very similar to what we've seen with ipython, in that it doesn't > >> >>>>>>>>> > >> >>>>>>>> always > >> >>>>> > >> >>>>>> play nicely with the graph construction / graph execution > >> >>>>>>>>> > >> >>>>>>>> distinction. I > >> >>>>> > >> >>>>>> wonder what changes to Beam might more generally support this. > The > >> >>>>>>>>> materialize stuff looks similar to some functionality in > >> FlumeJava > >> >>>>>>>>> > >> >>>>>>>> we > >> >>>> > >> >>>>> used > >> >>>>>>>>> to support multi-segment pipelines with some shared > intermediate > >> >>>>>>>>> PCollections. > >> >>>>>>>>> > >> >>>>>>>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré < > >> >>>>>>>>> > >> >>>>>>>> [email protected]> > >> >>>>> > >> >>>>>> wrote: > >> >>>>>>>>> > >> >>>>>>>>> Hi Neville, > >> >>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>>> thanks for the update ! > >> >>>>>>>>>> > >> >>>>>>>>>> As it's another language support, and to clearly identify the > >> >>>>>>>>>> > >> >>>>>>>>> purpose, > >> >>>>> > >> >>>>>> I > >> >>>>>>>>>> would say sdks/scala. > >> >>>>>>>>>> > >> >>>>>>>>>> Regards > >> >>>>>>>>>> JB > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> On 06/23/2016 11:56 PM, Neville Li wrote: > >> >>>>>>>>>> > >> >>>>>>>>>> +folks in my team > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li < > >> >>>>>>>>>>> > >> >>>>>>>>>> [email protected] > >> >>> > >> >>>> > >> >>>>> wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>> Hi all, > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio> > >> >>>>>>>>>>>> > >> >>>>>>>>>>> and > >> >>> > >> >>>> am > >> >>>> > >> >>>>> in > >> >>>>>>>>>>>> the > >> >>>>>>>>>>>> progress of moving code to Beam (BEAM-302 > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just > >> >>>>>>>>>>>> > >> >>>>>>>>>>> wondering > >> >>>> > >> >>>>> if > >> >>>>> > >> >>>>>> sdks/scala is the right place for this code or if something > >> >>>>>>>>>>>> > >> >>>>>>>>>>> like > >> >>> > >> >>>> dsls/scio > >> >>>>>>>>>>>> is a better choice? What do you think? > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> A little background: Scio was built as a high-level Scala > API > >> >>>>>>>>>>>> > >> >>>>>>>>>>> for > >> >>> > >> >>>> Google > >> >>>>>>>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily > >> influenced > >> >>>>>>>>>>>> > >> >>>>>>>>>>> by > >> >>>> > >> >>>>> Spark > >> >>>>>>>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK > >> while > >> >>>>>>>>>>>> > >> >>>>>>>>>>> also > >> >>>> > >> >>>>> providing features comparable to other Scala data frameworks. > >> >>>>>>>>>>>> > >> >>>>>>>>>>> We > >> >>> > >> >>>> use > >> >>>>> > >> >>>>>> Scio > >> >>>>>>>>>>>> on Dataflow for production extensively inside Spotify. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> Cheers, > >> >>>>>>>>>>>> Neville > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> -- > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> Jean-Baptiste Onofré > >> >>>>>>>>>> [email protected] > >> >>>>>>>>>> http://blog.nanthrax.net > >> >>>>>>>>>> Talend - http://www.talend.com > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> -- > >> >>>>>>>>> > >> >>>>>>>> Jean-Baptiste Onofré > >> >>>>>>>> [email protected] > >> >>>>>>>> http://blog.nanthrax.net > >> >>>>>>>> Talend - http://www.talend.com > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>> -- > >> >>>>>> Jean-Baptiste Onofré > >> >>>>>> [email protected] > >> >>>>>> http://blog.nanthrax.net > >> >>>>>> Talend - http://www.talend.com > >> >>>>>> > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> > -- > >> > Jean-Baptiste Onofré > >> > [email protected] > >> > http://blog.nanthrax.net > >> > Talend - http://www.talend.com > >> > > >> > > > > >
