Re: [DISCUSS] Structuring Java based DSLs

Jan Lukavský Fri, 30 Nov 2018 14:06:02 -0800

Hi Robert,

Euphoria must be superset of SQL for the proposed approach to work. And I 
think that it already is, or at least can be made so. There might be some 
subtleties missing or be different, but that is the nice thing - by building
the DSLs bottom up, we can make sure that they are mutually consistent - i.
e. there are not multiple implementations of join semantics with slightly 
different behavior (due to multiple implementations). It is of course
possible to take some parts that are common and make a separate library, but
the way I see it, it should be possible to make this shared library Euphoria
itself, there are (currently) no known features that would imply
incompatibility between the two (which would force the approach you
propose).


 Jan ---------- Původní e-mail ----------
Od: Robert Bradshaw <[email protected]>
Komu: [email protected]
Datum: 30. 11. 2018 21:39:01
Předmět: Re: [DISCUSS] Structuring Java based DSLs
"I don't really see Euphoria as a subset of SQL or the other way
around, and I think it makes sense to use either without the other, so
by this criteria keeping them as siblings than a nesting.

That said, I think it's really good to have a bunch of shared code,
e.g. a join library that could be used by both. One could even depend
on the other without having to abandon the sibling relationship.
Something like retractions belong in the core SDK itself. Deeper than
that, actually, it should be part of the model.

- Robert

On Fri, Nov 30, 2018 at 7:20 PM David Morávek <[email protected]> wrote:
>
> Jan, we made Kryo optional recently (it is a separate module and is used
only in tests). From a quick look it seems that we forgot to remove compile
time dependency from euphoria's build.gradle. Only "strong" dependencies I'm
aware of are core SDK and guava. We'll be probably adding sketching
extension dependency soon.
>
> D.
>
> On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský <[email protected]> wrote:
>>
>> Hi Anton,
>> reactions inline.
>>
>> ---------- Původní e-mail ----------
>> Od: Anton Kedin <[email protected]>
>> Komu: [email protected]
>> Datum: 30. 11. 2018 18:17:06
>> Předmět: Re: [DISCUSS] Structuring Java based DSLs
>>
>> I think this approach makes sense in general, Euphoria can be the
implementation detail of SQL, similar to Join Library or core SDK Schemas.
>>
>> I wonder though whether it would be better to bring Euphoria closer to 
core SDK first, maybe even merge them together. If you look at Reuven's 
recent work around schemas it seems like there are already similarities 
between that and Euphoria's approach, unless I'm missing the point (e.g. 
Filter transforms, FullJoin vs CoGroup... see [2]). And we're already
switching parts of SQL to those transforms (e.g. SQL Aggregation is now 
implemented by core SDK's Group[3]).
>>
>>
>>
>> Yes, these transforms seem to be very similar to those Euphoria has. 
Whether or not to merge Euphoria with core is essentially just a decision of
the community (in my point of view).
>>
>>
>>
>> Adding explicit Schema support to Euphoria will bring it both closer to
core SDK and make it natural to use for SQL. Can this be a first step
towards this integration?
>>
>>
>>
>> Euphoria currently operates on pure PCollections, so when PCollection has
a schema, it will be accessible by Euphoria. It makes sense to make use of
the schema in Euphoria - it seems natural on inputs to Euphoria operators,
but it might be tricky (not saying impossible) to actually produce schema-
aware PCollections as outputs from Euphoria operators (generally speaking,
in special cases that might be possible). Regarding inputs, there is
actually intention to act on type of PCollection - e.g. when PCollection is
already of type KV, then it is possible to make key extractor and value 
extractor optional in Euphoria builders, so it feels natural to enable
changing the builders when a schema-aware PCollection, and make use of the
provided schema. The rest of Euphoria team might correct me, if I'm wrong.
>>
>>
>>
>>
>> One question I have is, does Euphoria bring dependencies that are not 
needed by SQL, or does more or less only rely on the core SDK?
>>
>>
>>
>> I think the only relevant dependency that Euphoria has besides core SDK
is Kryo. It is the default coder when no coder is provided, but that could
be made optional - e.g. the default coder would be supported only if an 
appropriate module would be available. That way I think that Euphoria has no
special dependencies.
>>
>>
>>
>> [1] https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef
6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/
Group.java#L73
>> [2] https://github.com/apache/beam/tree/f66eb5fe23b2500b396e6f711cdf4aeef
6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
>> [3] https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef
6b31ab8/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/
extensions/sql/impl/rel/BeamAggregationRel.java#L179
>>
>>
>>
>> On Fri, Nov 30, 2018 at 6:29 AM Jan Lukavský <[email protected]> wrote:
>>
>> Hi community,
>>
>> I'm part of Euphoria DSL team, and on behalf of this team, I'd like to 
>> discuss possible development of Java based DSLs currently present in 
>> Beam. In my knowledge, there are currently two DSLs based on Java SDK -
>> Euphoria and SQL. These DSLs currently share only the SDK itself,
>> although there might be room to share some more effort. We already know
>> that both Euphoria and SQL have need for retractions, but there are
>> probably many more features that these two could share.
>>
>> So, I'd like to open a discussion on what it would cost and what it
>> would possibly bring, if instead of the current structure
>>
>> Java SDK
>>
>> | ---- SQL
>>
>> | ---- Euphoria
>>
>> these DSLs would be structured as
>>
>> Java SDK ---> Euphoria ---> SQL
>>
>> I'm absolutely sure that this would be a great investment and a huge 
>> change, but I'd like to gather some opinions and general feelings of the
>> community about this. Some points to start the discussion from my side 
>> would be, that structuring DSLs like this has internal logical
>> consistency, because each API layer further narrows completeness, but 
>> brings simpler API for simpler tasks, while adding additional high-level
>> view of the data processing pipeline and thus enabling more
>> optimizations. On Euphoria side, these are various implementations joins
>> (most effective implementation depends on data), pipeline sampling and 
>> more. Some (or maybe most) of these optimizations would have to be
>> implemented in both DSLs, so implementing them once is beneficial.
>> Another benefit is that this would bring Euphoria "closer" to Beam core
>> development (which would be good, it is part of the project anyway,
>> right? :)) and help better drive features, that although currently
>> needed mostly by SQL, might be needed by other Java users anyway.
>>
>> Thanks for discussion and looking forward to any opinions.
>>
>> Jan
>>
"

Re: [DISCUSS] Structuring Java based DSLs

Reply via email to