Re: [DISCUSS] Structuring Java based DSLs

Robert Bradshaw Fri, 30 Nov 2018 15:55:44 -0800

I suppose what I'm trying to say is that I see this module structure
as a tool for discoverability and enumerating end-user endpoints. In
other words, if one wants to use SQL, it would seem odd to have to
depend on sdks-java-euphoria-sql rather than just sdks-java-sql if
sdks-java-euphoria is also a DSL one might use. A sibling relationship
does not prohibit the layered approach to implementation that sounds
like it makes sense.


(As for merging Euphoria into core, my initial impression is that's
probably a good idea, and something we should consider for 3.0 at the
very least.)

On Fri, Nov 30, 2018 at 11:06 PM Jan Lukavský <[email protected]> wrote:
>
> Hi Rui,
>
> yes, there are optimizations that could be added by each layer. The purpose 
> of Euphoria layer actually is not to reorder or modify any user operators 
> that are present in the pipeline (because it might not have enough 
> information to do this), but it can for instance choose between various join 
> implementations (shuffle join, broadcast join, ...) - so the optimizations it 
> can do are more low level. But this plays nicely with the DSL hierarchy - 
> each layer adds a little more restrictions, but can therefore do more 
> optimizations. And I think that the layer between SDK and SQL wouldn't have 
> to support SQL optimizations, it would only have to support way for SQL to 
> express these optimizations.
>
>   Jan ---------- Původní e-mail ----------
> Od: Rui Wang <[email protected]>
> Komu: [email protected]
> Datum: 30. 11. 2018 22:43:04
> Předmět: Re: [DISCUSS] Structuring Java based DSLs
>
> SQL's optimization is another area to consider for integration. SQL 
> optimization includes pushing down filters/projections, merging or removing 
> or swapping plan nodes and comparing plan costs to choose best plan.  Add 
> another layer between SQL and java core might need the layer to support SQL 
> optimizations if there is a need.
>
> I don't have a clear image on what SQL needs from Euphoria for 
> optimization(best case is nothing). As those optimizations are happening or 
> will happen, we might start to have a sense of it.
>
> -Rui
>
> On Fri, Nov 30, 2018 at 12:38 PM Robert Bradshaw <[email protected]> wrote:
>
> I don't really see Euphoria as a subset of SQL or the other way
> around, and I think it makes sense to use either without the other, so
> by this criteria keeping them as siblings than a nesting.
>
> That said, I think it's really good to have a bunch of shared code,
> e.g. a join library that could be used by both. One could even depend
> on the other without having to abandon the sibling relationship.
> Something like retractions belong in the core SDK itself. Deeper than
> that, actually, it should be part of the model.
>
> - Robert
>
> On Fri, Nov 30, 2018 at 7:20 PM David Morávek <[email protected]> wrote:
> >
> > Jan, we made Kryo optional recently (it is a separate module and is used 
> > only in tests). From a quick look it seems that we forgot to remove compile 
> > time dependency from euphoria's build.gradle. Only "strong" dependencies 
> > I'm aware of are core SDK and guava. We'll be probably adding sketching 
> > extension dependency soon.
> >
> > D.
> >
> > On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský <[email protected]> wrote:
> >>
> >> Hi Anton,
> >> reactions inline.
> >>
> >> ---------- Původní e-mail ----------
> >> Od: Anton Kedin <[email protected]>
> >> Komu: [email protected]
> >> Datum: 30. 11. 2018 18:17:06
> >> Předmět: Re: [DISCUSS] Structuring Java based DSLs
> >>
> >> I think this approach makes sense in general, Euphoria can be the 
> >> implementation detail of SQL, similar to Join Library or core SDK Schemas.
> >>
> >> I wonder though whether it would be better to bring Euphoria closer to 
> >> core SDK first, maybe even merge them together. If you look at Reuven's 
> >> recent work around schemas it seems like there are already similarities 
> >> between that and Euphoria's approach, unless I'm missing the point (e.g. 
> >> Filter transforms, FullJoin vs CoGroup... see [2]). And we're already 
> >> switching parts of SQL to those transforms (e.g. SQL Aggregation is now 
> >> implemented by core SDK's Group[3]).
> >>
> >>
> >>
> >> Yes, these transforms seem to be very similar to those Euphoria has. 
> >> Whether or not to merge Euphoria with core is essentially just a decision 
> >> of the community (in my point of view).
> >>
> >>
> >>
> >> Adding explicit Schema support to Euphoria will bring it both closer to 
> >> core SDK and make it natural to use for SQL. Can this be a first step 
> >> towards this integration?
> >>
> >>
> >>
> >> Euphoria currently operates on pure PCollections, so when PCollection has 
> >> a schema, it will be accessible by Euphoria. It makes sense to make use of 
> >> the schema in Euphoria - it seems natural on inputs to Euphoria operators, 
> >> but it might be tricky (not saying impossible) to actually produce 
> >> schema-aware PCollections as outputs from Euphoria operators (generally 
> >> speaking, in special cases that might be possible). Regarding inputs, 
> >> there is actually intention to act on type of PCollection - e.g. when 
> >> PCollection is already of type KV, then it is possible to make key 
> >> extractor and value extractor optional in Euphoria builders, so it feels 
> >> natural to enable changing the builders when a schema-aware PCollection, 
> >> and make use of the provided schema. The rest of Euphoria team might 
> >> correct me, if I'm wrong.
> >>
> >>
> >>
> >>
> >> One question I have is, does Euphoria bring dependencies that are not 
> >> needed by SQL, or does more or less only rely on the core SDK?
> >>
> >>
> >>
> >> I think the only relevant dependency that Euphoria has besides core SDK is 
> >> Kryo. It is the default coder when no coder is provided, but that could be 
> >> made optional - e.g. the default coder would be supported only if an 
> >> appropriate module would be available. That way I think that Euphoria has 
> >> no special dependencies.
> >>
> >>
> >>
> >> [1] 
> >> https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Group.java#L73
> >> [2] 
> >> https://github.com/apache/beam/tree/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
> >> [3] 
> >> https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java#L179
> >>
> >>
> >>
> >> On Fri, Nov 30, 2018 at 6:29 AM Jan Lukavský <[email protected]> wrote:
> >>
> >> Hi community,
> >>
> >> I'm part of Euphoria DSL team, and on behalf of this team, I'd like to
> >> discuss possible development of Java based DSLs currently present in
> >> Beam. In my knowledge, there are currently two DSLs based on Java SDK -
> >> Euphoria and SQL. These DSLs currently share only the SDK itself,
> >> although there might be room to share some more effort. We already know
> >> that both Euphoria and SQL have need for retractions, but there are
> >> probably many more features that these two could share.
> >>
> >> So, I'd like to open a discussion on what it would cost and what it
> >> would possibly bring, if instead of the current structure
> >>
> >>    Java SDK
> >>
> >>      | ---- SQL
> >>
> >>      | ---- Euphoria
> >>
> >> these DSLs would be structured as
> >>
> >>    Java SDK ---> Euphoria ---> SQL
> >>
> >> I'm absolutely sure that this would be a great investment and a huge
> >> change, but I'd like to gather some opinions and general feelings of the
> >> community about this. Some points to start the discussion from my side
> >> would be, that structuring DSLs like this has internal logical
> >> consistency, because each API layer further narrows completeness, but
> >> brings simpler API for simpler tasks, while adding additional high-level
> >> view of the data processing pipeline and thus enabling more
> >> optimizations. On Euphoria side, these are various implementations joins
> >> (most effective implementation depends on data), pipeline sampling and
> >> more. Some (or maybe most) of these optimizations would have to be
> >> implemented in both DSLs, so implementing them once is beneficial.
> >> Another benefit is that this would bring Euphoria "closer" to Beam core
> >> development (which would be good, it is part of the project anyway,
> >> right? :)) and help better drive features, that although currently
> >> needed mostly by SQL, might be needed by other Java users anyway.
> >>
> >> Thanks for discussion and looking forward to any opinions.
> >>
> >>    Jan
> >>

Re: [DISCUSS] Structuring Java based DSLs

Reply via email to