I suppose what I'm trying to say is that I see this module structure as a tool for discoverability and enumerating end-user endpoints. In other words, if one wants to use SQL, it would seem odd to have to depend on sdks-java-euphoria-sql rather than just sdks-java-sql if sdks-java-euphoria is also a DSL one might use. A sibling relationship does not prohibit the layered approach to implementation that sounds like it makes sense.
(As for merging Euphoria into core, my initial impression is that's probably a good idea, and something we should consider for 3.0 at the very least.) On Fri, Nov 30, 2018 at 11:06 PM Jan Lukavský <je...@seznam.cz> wrote: > > Hi Rui, > > yes, there are optimizations that could be added by each layer. The purpose > of Euphoria layer actually is not to reorder or modify any user operators > that are present in the pipeline (because it might not have enough > information to do this), but it can for instance choose between various join > implementations (shuffle join, broadcast join, ...) - so the optimizations it > can do are more low level. But this plays nicely with the DSL hierarchy - > each layer adds a little more restrictions, but can therefore do more > optimizations. And I think that the layer between SDK and SQL wouldn't have > to support SQL optimizations, it would only have to support way for SQL to > express these optimizations. > > Jan ---------- Původní e-mail ---------- > Od: Rui Wang <ruw...@google.com> > Komu: dev@beam.apache.org > Datum: 30. 11. 2018 22:43:04 > Předmět: Re: [DISCUSS] Structuring Java based DSLs > > SQL's optimization is another area to consider for integration. SQL > optimization includes pushing down filters/projections, merging or removing > or swapping plan nodes and comparing plan costs to choose best plan. Add > another layer between SQL and java core might need the layer to support SQL > optimizations if there is a need. > > I don't have a clear image on what SQL needs from Euphoria for > optimization(best case is nothing). As those optimizations are happening or > will happen, we might start to have a sense of it. > > -Rui > > On Fri, Nov 30, 2018 at 12:38 PM Robert Bradshaw <rober...@google.com> wrote: > > I don't really see Euphoria as a subset of SQL or the other way > around, and I think it makes sense to use either without the other, so > by this criteria keeping them as siblings than a nesting. > > That said, I think it's really good to have a bunch of shared code, > e.g. a join library that could be used by both. One could even depend > on the other without having to abandon the sibling relationship. > Something like retractions belong in the core SDK itself. Deeper than > that, actually, it should be part of the model. > > - Robert > > On Fri, Nov 30, 2018 at 7:20 PM David Morávek <d...@apache.org> wrote: > > > > Jan, we made Kryo optional recently (it is a separate module and is used > > only in tests). From a quick look it seems that we forgot to remove compile > > time dependency from euphoria's build.gradle. Only "strong" dependencies > > I'm aware of are core SDK and guava. We'll be probably adding sketching > > extension dependency soon. > > > > D. > > > > On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský <je...@seznam.cz> wrote: > >> > >> Hi Anton, > >> reactions inline. > >> > >> ---------- Původní e-mail ---------- > >> Od: Anton Kedin <ke...@google.com> > >> Komu: dev@beam.apache.org > >> Datum: 30. 11. 2018 18:17:06 > >> Předmět: Re: [DISCUSS] Structuring Java based DSLs > >> > >> I think this approach makes sense in general, Euphoria can be the > >> implementation detail of SQL, similar to Join Library or core SDK Schemas. > >> > >> I wonder though whether it would be better to bring Euphoria closer to > >> core SDK first, maybe even merge them together. If you look at Reuven's > >> recent work around schemas it seems like there are already similarities > >> between that and Euphoria's approach, unless I'm missing the point (e.g. > >> Filter transforms, FullJoin vs CoGroup... see [2]). And we're already > >> switching parts of SQL to those transforms (e.g. SQL Aggregation is now > >> implemented by core SDK's Group[3]). > >> > >> > >> > >> Yes, these transforms seem to be very similar to those Euphoria has. > >> Whether or not to merge Euphoria with core is essentially just a decision > >> of the community (in my point of view). > >> > >> > >> > >> Adding explicit Schema support to Euphoria will bring it both closer to > >> core SDK and make it natural to use for SQL. Can this be a first step > >> towards this integration? > >> > >> > >> > >> Euphoria currently operates on pure PCollections, so when PCollection has > >> a schema, it will be accessible by Euphoria. It makes sense to make use of > >> the schema in Euphoria - it seems natural on inputs to Euphoria operators, > >> but it might be tricky (not saying impossible) to actually produce > >> schema-aware PCollections as outputs from Euphoria operators (generally > >> speaking, in special cases that might be possible). Regarding inputs, > >> there is actually intention to act on type of PCollection - e.g. when > >> PCollection is already of type KV, then it is possible to make key > >> extractor and value extractor optional in Euphoria builders, so it feels > >> natural to enable changing the builders when a schema-aware PCollection, > >> and make use of the provided schema. The rest of Euphoria team might > >> correct me, if I'm wrong. > >> > >> > >> > >> > >> One question I have is, does Euphoria bring dependencies that are not > >> needed by SQL, or does more or less only rely on the core SDK? > >> > >> > >> > >> I think the only relevant dependency that Euphoria has besides core SDK is > >> Kryo. It is the default coder when no coder is provided, but that could be > >> made optional - e.g. the default coder would be supported only if an > >> appropriate module would be available. That way I think that Euphoria has > >> no special dependencies. > >> > >> > >> > >> [1] > >> https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Group.java#L73 > >> [2] > >> https://github.com/apache/beam/tree/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms > >> [3] > >> https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java#L179 > >> > >> > >> > >> On Fri, Nov 30, 2018 at 6:29 AM Jan Lukavský <je...@seznam.cz> wrote: > >> > >> Hi community, > >> > >> I'm part of Euphoria DSL team, and on behalf of this team, I'd like to > >> discuss possible development of Java based DSLs currently present in > >> Beam. In my knowledge, there are currently two DSLs based on Java SDK - > >> Euphoria and SQL. These DSLs currently share only the SDK itself, > >> although there might be room to share some more effort. We already know > >> that both Euphoria and SQL have need for retractions, but there are > >> probably many more features that these two could share. > >> > >> So, I'd like to open a discussion on what it would cost and what it > >> would possibly bring, if instead of the current structure > >> > >> Java SDK > >> > >> | ---- SQL > >> > >> | ---- Euphoria > >> > >> these DSLs would be structured as > >> > >> Java SDK ---> Euphoria ---> SQL > >> > >> I'm absolutely sure that this would be a great investment and a huge > >> change, but I'd like to gather some opinions and general feelings of the > >> community about this. Some points to start the discussion from my side > >> would be, that structuring DSLs like this has internal logical > >> consistency, because each API layer further narrows completeness, but > >> brings simpler API for simpler tasks, while adding additional high-level > >> view of the data processing pipeline and thus enabling more > >> optimizations. On Euphoria side, these are various implementations joins > >> (most effective implementation depends on data), pipeline sampling and > >> more. Some (or maybe most) of these optimizations would have to be > >> implemented in both DSLs, so implementing them once is beneficial. > >> Another benefit is that this would bring Euphoria "closer" to Beam core > >> development (which would be good, it is part of the project anyway, > >> right? :)) and help better drive features, that although currently > >> needed mostly by SQL, might be needed by other Java users anyway. > >> > >> Thanks for discussion and looking forward to any opinions. > >> > >> Jan > >>