On Fri, Jun 24, 2016 at 2:03 PM, Raghu Angadi <[email protected]> wrote:
> DSL is a pretty generic term.. > I agree and am not married to it. Neville? > The fact that scio uses Java SDK is an implementation detail. Reasonable, which is why I am also not pushing hard for '/java/scio' to be in the path. > I love the > name scio. But I think sdks/scala might be most appropriate and would make > it a first class citizen for Beam. > I am strongly against it being in the 'sdks/' top-level module -- it's not a Beam SDK. Unlike DSL, SDK is a very specific term in Beam. > Where would a future python sdk reside? > The Python SDK is in the python-sdk branch on Apache already, and it lives in `sdks/python`. (And it is aiming to become a proper Beam SDK. ;) Thanks, Dan On Fri, Jun 24, 2016 at 1:50 PM, Jean-Baptiste Onofré <[email protected]> > wrote: > > > Agree for dsls/scio > > > > Regards > > JB > > > > > > On 06/24/2016 10:22 PM, Lukasz Cwik wrote: > > > >> +1 for dsls/scio for the already listed reasons > >> > >> On Fri, Jun 24, 2016 at 11:21 AM, Rafal Wojdyla <[email protected] > > > >> wrote: > >> > >> Hello. When it comes to SDK vs DSL - I fully agree with Frances. About > >>> dsls/java/scio or dsls/scio - dsls/java/scio may cause confusion, scio > >>> is a > >>> scala DSL but lives under java directory (?) - that makes sense only > once > >>> you get that scio is using java SDK under the hood. Thus, +1 to > >>> dsls/scio. > >>> - Rafal > >>> > >>> On Fri, Jun 24, 2016 at 2:01 PM, Kenneth Knowles > <[email protected] > >>> > > >>> wrote: > >>> > >>> My +1 goes to dsls/scio. It already has a cool name, so let's use it. > And > >>>> there might be other Scala-based DSLs. > >>>> > >>>> On Fri, Jun 24, 2016 at 8:39 AM, Ismaël Mejía <[email protected]> > >>>> wrote: > >>>> > >>>> Hello everyone, > >>>>> > >>>>> Neville, thanks a lot for your contribution. Your work is amazing > and I > >>>>> > >>>> am > >>>> > >>>>> really happy that this scala integration is finally happening. > >>>>> Congratulations to you and your team. > >>>>> > >>>>> I *strongly* disagree about the DSL classification for scio for one > >>>>> > >>>> reason, > >>>> > >>>>> if you go to the root of the term, Domain Specific Languages are > about > >>>>> > >>>> a > >>> > >>>> domain, and the domain in this case is writing Beam pipelines, which > >>>>> > >>>> is a > >>> > >>>> really broad domain. > >>>>> > >>>>> I agree with Frances’ argument that scio is not an SDK e.g. it reuses > >>>>> > >>>> the > >>> > >>>> existing Beam java SDK. My proposition is that scio will be called the > >>>>> Scala API because in the end this is what it is. I think the > confusion > >>>>> comes from the common definition of SDK which is normally an API + a > >>>>> Runtime. In this case scio will share the runtime with what we call > the > >>>>> Beam Java SDK. > >>>>> > >>>>> One additional point of using the term API is that it sends the clear > >>>>> message that Beam has a Scala API too (which is good for visibility > as > >>>>> > >>>> JB > >>> > >>>> mentioned). > >>>>> > >>>>> Regards, > >>>>> Ismaël > >>>>> > >>>>> > >>>>> On Fri, Jun 24, 2016 at 5:08 PM, Jean-Baptiste Onofré < > [email protected] > >>>>> > >>>> > >>>> wrote: > >>>>> > >>>>> Hi Dan, > >>>>>> > >>>>>> fair enough. > >>>>>> > >>>>>> As I'm also working on new DSLs (XML, JSON), I already created the > >>>>>> > >>>>> dsls > >>> > >>>> module. > >>>>>> > >>>>>> So, I would say dsls/scala. > >>>>>> > >>>>>> WDYT ? > >>>>>> > >>>>>> Regards > >>>>>> JB > >>>>>> > >>>>>> > >>>>>> On 06/24/2016 05:07 PM, Dan Halperin wrote: > >>>>>> > >>>>>> I don't think that sdks/scala is the right place -- scio is not a > >>>>>>> > >>>>>> Beam > >>> > >>>> Scala SDK; it wraps the existing Java SDK. > >>>>>>> > >>>>>>> Some options: > >>>>>>> * sdks/java/extensions (Scio builds on the Java SDK) -- mentally > >>>>>>> > >>>>>> vetoed > >>>> > >>>>> since Scio isn't an extension for the Java SDK, but rather a wrapper > >>>>>>> > >>>>>>> * dsls/java/scio (Scio is a Beam DSL that uses the Java SDK) > >>>>>>> * dsls/scio (Scio is a Beam DSL that could eventually use multiple > >>>>>>> > >>>>>> SDKs) > >>>>> > >>>>>> * extensions/java/scio (Scio is an extension of Beam that uses the > >>>>>>> > >>>>>> Java > >>>> > >>>>> SDK) > >>>>>>> * extensions/scio (Scio is an extension of Beam that is not > limited > >>>>>>> > >>>>>> to > >>>> > >>>>> one > >>>>>>> SDK) > >>>>>>> > >>>>>>> I lean towards either dsls/java/scio or extensions/java/scio, since > >>>>>>> > >>>>>> I > >>> > >>>> don't > >>>>>>> think there are plans for Scio to handle multiple different SDKs > (in > >>>>>>> different languages). The question between these two is whether we > >>>>>>> > >>>>>> think > >>>> > >>>>> DSLs are "big enough" to be a top level concept. > >>>>>>> > >>>>>>> On Thu, Jun 23, 2016 at 11:05 PM, Jean-Baptiste Onofré < > >>>>>>> > >>>>>> [email protected] > >>>> > >>>>> > >>>>>> wrote: > >>>>>>> > >>>>>>> Good point about new Fn and the fact it's based on the Java SDK. > >>>>>>> > >>>>>>>> > >>>>>>>> It's just that in term of "marketing", it's a good message to > >>>>>>>> > >>>>>>> provide a > >>>> > >>>>> Scala SDK even if technically it's more a DSL. > >>>>>>>> > >>>>>>>> For instance, a valid "marketing" DSL would be a Java fluent DSL > on > >>>>>>>> > >>>>>>> top > >>>> > >>>>> of > >>>>>>>> the Java SDK, or a declarative XML DSL. > >>>>>>>> > >>>>>>>> However, from a technical perspective, it can go into dsl module. > >>>>>>>> > >>>>>>>> My $0.02 ;) > >>>>>>>> > >>>>>>>> Regards > >>>>>>>> JB > >>>>>>>> > >>>>>>>> > >>>>>>>> On 06/24/2016 06:51 AM, Frances Perry wrote: > >>>>>>>> > >>>>>>>> +Rafal & Andrew again > >>>>>>>> > >>>>>>>>> > >>>>>>>>> I am leaning DSL for two reasons: (1) scio uses the existing java > >>>>>>>>> execution > >>>>>>>>> environment (and won't have a language-specific fn harness of its > >>>>>>>>> > >>>>>>>> own), > >>>>> > >>>>>> and > >>>>>>>>> (2) it changes the abstractions that users interact with. > >>>>>>>>> > >>>>>>>>> I recently saw a scio repl demo from Reuven -- there's some > really > >>>>>>>>> > >>>>>>>> cool > >>>>> > >>>>>> stuff in there. I'd love to dive into it a bit more and see what > >>>>>>>>> > >>>>>>>> can > >>> > >>>> be > >>>>> > >>>>>> generalized beyond scio. The repl-like interactive graph > >>>>>>>>> > >>>>>>>> construction > >>>> > >>>>> is > >>>>> > >>>>>> very similar to what we've seen with ipython, in that it doesn't > >>>>>>>>> > >>>>>>>> always > >>>>> > >>>>>> play nicely with the graph construction / graph execution > >>>>>>>>> > >>>>>>>> distinction. I > >>>>> > >>>>>> wonder what changes to Beam might more generally support this. The > >>>>>>>>> materialize stuff looks similar to some functionality in > FlumeJava > >>>>>>>>> > >>>>>>>> we > >>>> > >>>>> used > >>>>>>>>> to support multi-segment pipelines with some shared intermediate > >>>>>>>>> PCollections. > >>>>>>>>> > >>>>>>>>> On Thu, Jun 23, 2016 at 9:22 PM, Jean-Baptiste Onofré < > >>>>>>>>> > >>>>>>>> [email protected]> > >>>>> > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Hi Neville, > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> thanks for the update ! > >>>>>>>>>> > >>>>>>>>>> As it's another language support, and to clearly identify the > >>>>>>>>>> > >>>>>>>>> purpose, > >>>>> > >>>>>> I > >>>>>>>>>> would say sdks/scala. > >>>>>>>>>> > >>>>>>>>>> Regards > >>>>>>>>>> JB > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 06/23/2016 11:56 PM, Neville Li wrote: > >>>>>>>>>> > >>>>>>>>>> +folks in my team > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Thu, Jun 23, 2016 at 5:57 PM Neville Li < > >>>>>>>>>>> > >>>>>>>>>> [email protected] > >>> > >>>> > >>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi all, > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I'm the co-author of Scio <https://github.com/spotify/scio> > >>>>>>>>>>>> > >>>>>>>>>>> and > >>> > >>>> am > >>>> > >>>>> in > >>>>>>>>>>>> the > >>>>>>>>>>>> progress of moving code to Beam (BEAM-302 > >>>>>>>>>>>> <https://issues.apache.org/jira/browse/BEAM-302>). Just > >>>>>>>>>>>> > >>>>>>>>>>> wondering > >>>> > >>>>> if > >>>>> > >>>>>> sdks/scala is the right place for this code or if something > >>>>>>>>>>>> > >>>>>>>>>>> like > >>> > >>>> dsls/scio > >>>>>>>>>>>> is a better choice? What do you think? > >>>>>>>>>>>> > >>>>>>>>>>>> A little background: Scio was built as a high-level Scala API > >>>>>>>>>>>> > >>>>>>>>>>> for > >>> > >>>> Google > >>>>>>>>>>>> Cloud Dataflow (now also Apache Beam) and is heavily > influenced > >>>>>>>>>>>> > >>>>>>>>>>> by > >>>> > >>>>> Spark > >>>>>>>>>>>> and Scalding. It wraps around the Dataflow/Beam Java SDK while > >>>>>>>>>>>> > >>>>>>>>>>> also > >>>> > >>>>> providing features comparable to other Scala data frameworks. > >>>>>>>>>>>> > >>>>>>>>>>> We > >>> > >>>> use > >>>>> > >>>>>> Scio > >>>>>>>>>>>> on Dataflow for production extensively inside Spotify. > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> Neville > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Jean-Baptiste Onofré > >>>>>>>>>> [email protected] > >>>>>>>>>> http://blog.nanthrax.net > >>>>>>>>>> Talend - http://www.talend.com > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>> > >>>>>>>> Jean-Baptiste Onofré > >>>>>>>> [email protected] > >>>>>>>> http://blog.nanthrax.net > >>>>>>>> Talend - http://www.talend.com > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> -- > >>>>>> Jean-Baptiste Onofré > >>>>>> [email protected] > >>>>>> http://blog.nanthrax.net > >>>>>> Talend - http://www.talend.com > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > -- > > Jean-Baptiste Onofré > > [email protected] > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >
