Re: [DISCUSS] Structuring Java based DSLs

David Morávek Fri, 30 Nov 2018 10:21:01 -0800

Jan, we made Kryo optional recently (it is a separate module and is used
only in tests). From a quick look it seems that we forgot to remove compile
time dependency from euphoria's *build.gradle*. Only "strong" dependencies
I'm aware of are core SDK and guava. We'll be probably adding sketching
extension dependency soon.


D.

On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský <[email protected]> wrote:

> Hi Anton,
> reactions inline.
>
> ---------- Původní e-mail ----------
> Od: Anton Kedin <[email protected]>
> Komu: [email protected]
> Datum: 30. 11. 2018 18:17:06
> Předmět: Re: [DISCUSS] Structuring Java based DSLs
>
> I think this approach makes sense in general, Euphoria can be the
> implementation detail of SQL, similar to Join Library or core SDK Schemas.
>
> I wonder though whether it would be better to bring Euphoria closer to
> core SDK first, maybe even merge them together. If you look at Reuven's
> recent work around schemas it seems like there are already similarities
> between that and Euphoria's approach, unless I'm missing the point (e.g.
> Filter transforms, FullJoin vs CoGroup... see [2]). And we're already
> switching parts of SQL to those transforms (e.g. SQL Aggregation is now
> implemented by core SDK's Group[3]).
>
>
>
> Yes, these transforms seem to be very similar to those Euphoria has.
> Whether or not to merge Euphoria with core is essentially just a decision
> of the community (in my point of view).
>
>
>
> Adding explicit Schema support to Euphoria will bring it both closer to
> core SDK and make it natural to use for SQL. Can this be a first step
> towards this integration?
>
>
>
> Euphoria currently operates on pure PCollections, so when PCollection has
> a schema, it will be accessible by Euphoria. It makes sense to make use of
> the schema in Euphoria - it seems natural on inputs to Euphoria operators,
> but it might be tricky (not saying impossible) to actually produce
> schema-aware PCollections as outputs from Euphoria operators (generally
> speaking, in special cases that might be possible). Regarding inputs, there
> is actually intention to act on type of PCollection - e.g. when PCollection
> is already of type KV, then it is possible to make key extractor and value
> extractor optional in Euphoria builders, so it feels natural to enable
> changing the builders when a schema-aware PCollection, and make use of the
> provided schema. The rest of Euphoria team might correct me, if I'm wrong.
>
>
>
>
> One question I have is, does Euphoria bring dependencies that are not
> needed by SQL, or does more or less only rely on the core SDK?
>
>
>
> I think the only relevant dependency that Euphoria has besides core SDK is
> Kryo. It is the default coder when no coder is provided, but that could be
> made optional - e.g. the default coder would be supported only if an
> appropriate module would be available. That way I think that Euphoria has
> no special dependencies.
>
>
>
> [1]
> https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Group.java#L73
> [2]
> https://github.com/apache/beam/tree/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
> [3]
> https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java#L179
>
>
>
> On Fri, Nov 30, 2018 at 6:29 AM Jan Lukavský <[email protected]> wrote:
>
> Hi community,
>
> I'm part of Euphoria DSL team, and on behalf of this team, I'd like to
> discuss possible development of Java based DSLs currently present in
> Beam. In my knowledge, there are currently two DSLs based on Java SDK -
> Euphoria and SQL. These DSLs currently share only the SDK itself,
> although there might be room to share some more effort. We already know
> that both Euphoria and SQL have need for retractions, but there are
> probably many more features that these two could share.
>
> So, I'd like to open a discussion on what it would cost and what it
> would possibly bring, if instead of the current structure
>
>    Java SDK
>
>      | ---- SQL
>
>      | ---- Euphoria
>
> these DSLs would be structured as
>
>    Java SDK ---> Euphoria ---> SQL
>
> I'm absolutely sure that this would be a great investment and a huge
> change, but I'd like to gather some opinions and general feelings of the
> community about this. Some points to start the discussion from my side
> would be, that structuring DSLs like this has internal logical
> consistency, because each API layer further narrows completeness, but
> brings simpler API for simpler tasks, while adding additional high-level
> view of the data processing pipeline and thus enabling more
> optimizations. On Euphoria side, these are various implementations joins
> (most effective implementation depends on data), pipeline sampling and
> more. Some (or maybe most) of these optimizations would have to be
> implemented in both DSLs, so implementing them once is beneficial.
> Another benefit is that this would bring Euphoria "closer" to Beam core
> development (which would be good, it is part of the project anyway,
> right? :)) and help better drive features, that although currently
> needed mostly by SQL, might be needed by other Java users anyway.
>
> Thanks for discussion and looking forward to any opinions.
>
>    Jan
>
>

Re: [DISCUSS] Structuring Java based DSLs

Reply via email to