Jan, we made Kryo optional recently (it is a separate module and is used only in tests). From a quick look it seems that we forgot to remove compile time dependency from euphoria's *build.gradle*. Only "strong" dependencies I'm aware of are core SDK and guava. We'll be probably adding sketching extension dependency soon.
D. On Fri, Nov 30, 2018 at 7:08 PM Jan Lukavský <je...@seznam.cz> wrote: > Hi Anton, > reactions inline. > > ---------- Původní e-mail ---------- > Od: Anton Kedin <ke...@google.com> > Komu: dev@beam.apache.org > Datum: 30. 11. 2018 18:17:06 > Předmět: Re: [DISCUSS] Structuring Java based DSLs > > I think this approach makes sense in general, Euphoria can be the > implementation detail of SQL, similar to Join Library or core SDK Schemas. > > I wonder though whether it would be better to bring Euphoria closer to > core SDK first, maybe even merge them together. If you look at Reuven's > recent work around schemas it seems like there are already similarities > between that and Euphoria's approach, unless I'm missing the point (e.g. > Filter transforms, FullJoin vs CoGroup... see [2]). And we're already > switching parts of SQL to those transforms (e.g. SQL Aggregation is now > implemented by core SDK's Group[3]). > > > > Yes, these transforms seem to be very similar to those Euphoria has. > Whether or not to merge Euphoria with core is essentially just a decision > of the community (in my point of view). > > > > Adding explicit Schema support to Euphoria will bring it both closer to > core SDK and make it natural to use for SQL. Can this be a first step > towards this integration? > > > > Euphoria currently operates on pure PCollections, so when PCollection has > a schema, it will be accessible by Euphoria. It makes sense to make use of > the schema in Euphoria - it seems natural on inputs to Euphoria operators, > but it might be tricky (not saying impossible) to actually produce > schema-aware PCollections as outputs from Euphoria operators (generally > speaking, in special cases that might be possible). Regarding inputs, there > is actually intention to act on type of PCollection - e.g. when PCollection > is already of type KV, then it is possible to make key extractor and value > extractor optional in Euphoria builders, so it feels natural to enable > changing the builders when a schema-aware PCollection, and make use of the > provided schema. The rest of Euphoria team might correct me, if I'm wrong. > > > > > One question I have is, does Euphoria bring dependencies that are not > needed by SQL, or does more or less only rely on the core SDK? > > > > I think the only relevant dependency that Euphoria has besides core SDK is > Kryo. It is the default coder when no coder is provided, but that could be > made optional - e.g. the default coder would be supported only if an > appropriate module would be available. That way I think that Euphoria has > no special dependencies. > > > > [1] > https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/Group.java#L73 > [2] > https://github.com/apache/beam/tree/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms > [3] > https://github.com/apache/beam/blob/f66eb5fe23b2500b396e6f711cdf4aeef6b31ab8/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamAggregationRel.java#L179 > > > > On Fri, Nov 30, 2018 at 6:29 AM Jan Lukavský <je...@seznam.cz> wrote: > > Hi community, > > I'm part of Euphoria DSL team, and on behalf of this team, I'd like to > discuss possible development of Java based DSLs currently present in > Beam. In my knowledge, there are currently two DSLs based on Java SDK - > Euphoria and SQL. These DSLs currently share only the SDK itself, > although there might be room to share some more effort. We already know > that both Euphoria and SQL have need for retractions, but there are > probably many more features that these two could share. > > So, I'd like to open a discussion on what it would cost and what it > would possibly bring, if instead of the current structure > > Java SDK > > | ---- SQL > > | ---- Euphoria > > these DSLs would be structured as > > Java SDK ---> Euphoria ---> SQL > > I'm absolutely sure that this would be a great investment and a huge > change, but I'd like to gather some opinions and general feelings of the > community about this. Some points to start the discussion from my side > would be, that structuring DSLs like this has internal logical > consistency, because each API layer further narrows completeness, but > brings simpler API for simpler tasks, while adding additional high-level > view of the data processing pipeline and thus enabling more > optimizations. On Euphoria side, these are various implementations joins > (most effective implementation depends on data), pipeline sampling and > more. Some (or maybe most) of these optimizations would have to be > implemented in both DSLs, so implementing them once is beneficial. > Another benefit is that this would bring Euphoria "closer" to Beam core > development (which would be good, it is part of the project anyway, > right? :)) and help better drive features, that although currently > needed mostly by SQL, might be needed by other Java users anyway. > > Thanks for discussion and looking forward to any opinions. > > Jan > >