On Wed, Oct 3, 2018 at 12:16 PM Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi Anton,
>
> jackson is the json extension as we have XML. Agree that it should be
> documented.
>
> Agree about join-library.
>
> sketching is some statistic extensions providing ready to use stats
> CombineFn.
>
> Regards
> JB
>
> On 03/10/2018 20:25, Anton Kedin wrote:
> > Hi dev@,
> >
> > *TL;DR:* `sdks/java/extensions` is hard to discover, navigate and
> > understand.
> >
> > *Current State:*
> > *
> > *
> > I was looking at `sdks/java/extensions`[1] and realized that I don't
> > know what half of those things are. Only `join library` and `sorter`
> > seem to be documented and discoverable on Beam website, under SDKs
> > section [2].
> >
> > Here's the list of all extensions with my questions/comments:
> >   - /google-cloud-platform-core/. What is this? Is this used in GCP IOs?
> > If so, is `extensions` the right place for it? If it is, then why is it
> > a `-core` extension? It feels like it's a utility package, not an
> extension;
> >   - /jackson/. I can guess what it is but we should document it
> somewhere;
> >   - /join-library/. It is documented, but I think we should add more
> > documentation to explain how it works, maybe some caveats, and link
> > to/from the `CoGBK` section of the doc;
>

Should also probably indicate that using the join-library twice on the same
with 3 input collections is less efficient than a single CoGBK with those 3
input collections.


> >   - /protobuf/. I can probably guess what is it. Is 'extensions' the
> > right place for it though? We use this library in IOs
> > (`PubsubsIO.readProtos()`), should we move it to IO then? Same as with
> > GCP extension, feels like a utility library, not an extension;
> >   - /sketching/. No idea what to expect from this without reading the
> code;
> >   - /sorter/. Documented on the website;
> >   - /sql/. This looks familiar :) It is documented but not linked from
> > the extensions section, it's unclear whether it's the whole SQL or just
> > some related components;
> >
> > [1]: https://github.com/apache/beam/tree/master/sdks/java/extensions
> > [2]: https://beam.apache.org/documentation/sdks/java-extensions/
> >
> > *Questions:*
> >
> >   - should we minimally document (at least describe) all extensions and
> > add at least short readme.md's with the links to the Beam website?
> >   - is it a right thing to depend on `extensions` in other components
> > like IOs?
> >   - would it make sense to move some things out of 'extensions'? E.g. IO
> > components to IO or utility package, SQL into new DSLs package;
> >
> > *Opinion:*
> > *
> > *
> > Maybe I am misunderstanding the intent and meaning of 'extensions', but
> > from my perspective:
> > *
> > *
> >   - I think that extensions should be more or less isolated from the
> > Beam SDK itself, so that if you delete or modify them, no Beam-internal
> > changes will be required (changes to something that's not an extension).
> > And my feeling is that they should provide value by themselves to users
> > other than SDK authors. They are called 'extensions', not 'critical
> > components' or 'sdk utilities';
> >
> >   - I don't think that IOs should depend on 'extensions'. Otherwise the
> > question is, is it ok for other components, like runners, to do the
> > same? Or even core?
> >
> >   - I think there are few distinguishable classes of things in
> > 'extensions' right now:
> >       - collections of `PTransforms` with some business logic (Sorter,
> > Join, Sketch);
> >       - collections of `PTransforms` with focus parsing (Jackson,
> Protobuf);
> >       - DSLs; SQL DSL with more than just a few `PTransforms`, it can be
> > used almost as a standalone SDK. Things like Euphoria will probably end
> > up in the same class;
> >       - utility libraries shared by some parts of the SDK and unclear if
> > they are valuable by themselves to external users (Protobuf, GCP core);
> >     To me the business logic and parsing libraries do make sense to stay
> > in extensions, but probably under different subdirectories. I think it
> > will make sense to split others out of extensions into separate parts of
> > the SDK.
> >
> >   - I think we should add readme.md's with short descriptions and links
> > to Beam website;
> >
> > Thoughts, comments?
> >
> >
> > [1]: https://github.com/apache/beam/tree/master/sdks/java/extensions
> > [2]: https://beam.apache.org/documentation/sdks/java-extensions/
>

Reply via email to