Re: [DISCUSS] Structuring Java based DSLs

Jan Lukavský Fri, 30 Nov 2018 23:39:01 -0800

I think that the fact that SQL uses some other internal dependencyshould remain hidden implementation detail. I absolutely agree that thedependency should of course remain sdks-java-sql in all cases.

Jan


On 12/1/18 12:54 AM, Robert Bradshaw wrote:

I suppose what I'm trying to say is that I see this module structure
as a tool for discoverability and enumerating end-user endpoints. In
other words, if one wants to use SQL, it would seem odd to have to
depend on sdks-java-euphoria-sql rather than just sdks-java-sql if
sdks-java-euphoria is also a DSL one might use. A sibling relationship
does not prohibit the layered approach to implementation that sounds
like it makes sense.

(As for merging Euphoria into core, my initial impression is that's
probably a good idea, and something we should consider for 3.0 at the
very least.)

On Fri, Nov 30, 2018 at 11:06 PM Jan Lukavský <je...@seznam.cz> wrote:

Hi Rui,

yes, there are optimizations that could be added by each layer. The purpose of 
Euphoria layer actually is not to reorder or modify any user operators that are 
present in the pipeline (because it might not have enough information to do 
this), but it can for instance choose between various join implementations 
(shuffle join, broadcast join, ...) - so the optimizations it can do are more 
low level. But this plays nicely with the DSL hierarchy - each layer adds a 
little more restrictions, but can therefore do more optimizations. And I think 
that the layer between SDK and SQL wouldn't have to support SQL optimizations, 
it would only have to support way for SQL to express these optimizations.

   Jan ---------- Původní e-mail ----------
Od: Rui Wang <ruw...@google.com>
Komu: dev@beam.apache.org
Datum: 30. 11. 2018 22:43:04
Předmět: Re: [DISCUSS] Structuring Java based DSLs

SQL's optimization is another area to consider for integration. SQL 
optimization includes pushing down filters/projections, merging or removing or 
swapping plan nodes and comparing plan costs to choose best plan.  Add another 
layer between SQL and java core might need the layer to support SQL 
optimizations if there is a need.

I don't have a clear image on what SQL needs from Euphoria for 
optimization(best case is nothing). As those optimizations are happening or 
will happen, we might start to have a sense of it.

-Rui

On Fri, Nov 30, 2018 at 12:38 PM Robert Bradshaw <rober...@google.com> wrote:

I don't really see Euphoria as a subset of SQL or the other way
around, and I think it makes sense to use either without the other, so
by this criteria keeping them as siblings than a nesting.

That said, I think it's really good to have a bunch of shared code,
e.g. a join library that could be used by both. One could even depend
on the other without having to abandon the sibling relationship.
Something like retractions belong in the core SDK itself. Deeper than
that, actually, it should be part of the model.

- Robert

On Fri, Nov 30, 2018 at 7:20 PM David Morávek <d...@apache.org> wrote:

Jan, we made Kryo optional recently (it is a separate module and is used only in tests). 
From a quick look it seems that we forgot to remove compile time dependency from 
euphoria's build.gradle. Only "strong" dependencies I'm aware of are core SDK 
and guava. We'll be probably adding sketching extension dependency soon.

D.

On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský <je...@seznam.cz> wrote:

Hi Anton,
reactions inline.

---------- Původní e-mail ----------
Od: Anton Kedin <ke...@google.com>
Komu: dev@beam.apache.org
Datum: 30. 11. 2018 18:17:06
Předmět: Re: [DISCUSS] Structuring Java based DSLs

I think this approach makes sense in general, Euphoria can be the
implementation detail of SQL, similar to Join Library or core SDK Schemas.

I wonder though whether it would be better to bring Euphoria closer to core SDK
first, maybe even merge them together. If you look at Reuven's recent work
around schemas it seems like there are already similarities between that and
Euphoria's approach, unless I'm missing the point (e.g. Filter transforms,
FullJoin vs CoGroup... see [2]). And we're already switching parts of SQL to
those transforms (e.g. SQL Aggregation is now implemented by core SDK's
Group[3]).

Yes, these transforms seem to be very similar to those Euphoria has. Whether or
not to merge Euphoria with core is essentially just a decision of the community
(in my point of view).

Adding explicit Schema support to Euphoria will bring it both closer to core
SDK and make it natural to use for SQL. Can this be a first step towards this
integration?

Euphoria currently operates on pure PCollections, so when PCollection has a
schema, it will be accessible by Euphoria. It makes sense to make use of the
schema in Euphoria - it seems natural on inputs to Euphoria operators, but it
might be tricky (not saying impossible) to actually produce schema-aware
PCollections as outputs from Euphoria operators (generally speaking, in special
cases that might be possible). Regarding inputs, there is actually intention to
act on type of PCollection - e.g. when PCollection is already of type KV, then
it is possible to make key extractor and value extractor optional in Euphoria
builders, so it feels natural to enable changing the builders when a
schema-aware PCollection, and make use of the provided schema. The rest of
Euphoria team might correct me, if I'm wrong.

One question I have is, does Euphoria bring dependencies that are not needed by
SQL, or does more or less only rely on the core SDK?

I think the only relevant dependency that Euphoria has besides core SDK is
Kryo. It is the default coder when no coder is provided, but that could be made
optional - e.g. the default coder would be supported only if an appropriate
module would be available. That way I think that Euphoria has no special
dependencies.

[1]
https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Group.java#L73
[2]
https://github.com/apache/beam/tree/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
[3]
https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java#L179

On Fri, Nov 30, 2018 at 6:29 AM Jan Lukavský <je...@seznam.cz> wrote:

Hi community,

I'm part of Euphoria DSL team, and on behalf of this team, I'd like to
discuss possible development of Java based DSLs currently present in
Beam. In my knowledge, there are currently two DSLs based on Java SDK -
Euphoria and SQL. These DSLs currently share only the SDK itself,
although there might be room to share some more effort. We already know
that both Euphoria and SQL have need for retractions, but there are
probably many more features that these two could share.

So, I'd like to open a discussion on what it would cost and what it
would possibly bring, if instead of the current structure

Java SDK

| ---- SQL

| ---- Euphoria

these DSLs would be structured as

Java SDK ---> Euphoria ---> SQL

I'm absolutely sure that this would be a great investment and a huge
change, but I'd like to gather some opinions and general feelings of the
community about this. Some points to start the discussion from my side
would be, that structuring DSLs like this has internal logical
consistency, because each API layer further narrows completeness, but
brings simpler API for simpler tasks, while adding additional high-level
view of the data processing pipeline and thus enabling more
optimizations. On Euphoria side, these are various implementations joins
(most effective implementation depends on data), pipeline sampling and
more. Some (or maybe most) of these optimizations would have to be
implemented in both DSLs, so implementing them once is beneficial.
Another benefit is that this would bring Euphoria "closer" to Beam core
development (which would be good, it is part of the project anyway,
right? :)) and help better drive features, that although currently
needed mostly by SQL, might be needed by other Java users anyway.

Thanks for discussion and looking forward to any opinions.

Jan

Re: [DISCUSS] Structuring Java based DSLs

Reply via email to