On Wed, Oct 3, 2018 at 12:16 PM Jean-Baptiste Onofré <[email protected]> wrote:
> Hi Anton, > > jackson is the json extension as we have XML. Agree that it should be > documented. > > Agree about join-library. > > sketching is some statistic extensions providing ready to use stats > CombineFn. > > Regards > JB > > On 03/10/2018 20:25, Anton Kedin wrote: > > Hi dev@, > > > > *TL;DR:* `sdks/java/extensions` is hard to discover, navigate and > > understand. > > > > *Current State:* > > * > > * > > I was looking at `sdks/java/extensions`[1] and realized that I don't > > know what half of those things are. Only `join library` and `sorter` > > seem to be documented and discoverable on Beam website, under SDKs > > section [2]. > > > > Here's the list of all extensions with my questions/comments: > > - /google-cloud-platform-core/. What is this? Is this used in GCP IOs? > > If so, is `extensions` the right place for it? If it is, then why is it > > a `-core` extension? It feels like it's a utility package, not an > extension; > > - /jackson/. I can guess what it is but we should document it > somewhere; > > - /join-library/. It is documented, but I think we should add more > > documentation to explain how it works, maybe some caveats, and link > > to/from the `CoGBK` section of the doc; > Should also probably indicate that using the join-library twice on the same with 3 input collections is less efficient than a single CoGBK with those 3 input collections. > > - /protobuf/. I can probably guess what is it. Is 'extensions' the > > right place for it though? We use this library in IOs > > (`PubsubsIO.readProtos()`), should we move it to IO then? Same as with > > GCP extension, feels like a utility library, not an extension; > > - /sketching/. No idea what to expect from this without reading the > code; > > - /sorter/. Documented on the website; > > - /sql/. This looks familiar :) It is documented but not linked from > > the extensions section, it's unclear whether it's the whole SQL or just > > some related components; > > > > [1]: https://github.com/apache/beam/tree/master/sdks/java/extensions > > [2]: https://beam.apache.org/documentation/sdks/java-extensions/ > > > > *Questions:* > > > > - should we minimally document (at least describe) all extensions and > > add at least short readme.md's with the links to the Beam website? > > - is it a right thing to depend on `extensions` in other components > > like IOs? > > - would it make sense to move some things out of 'extensions'? E.g. IO > > components to IO or utility package, SQL into new DSLs package; > > > > *Opinion:* > > * > > * > > Maybe I am misunderstanding the intent and meaning of 'extensions', but > > from my perspective: > > * > > * > > - I think that extensions should be more or less isolated from the > > Beam SDK itself, so that if you delete or modify them, no Beam-internal > > changes will be required (changes to something that's not an extension). > > And my feeling is that they should provide value by themselves to users > > other than SDK authors. They are called 'extensions', not 'critical > > components' or 'sdk utilities'; > > > > - I don't think that IOs should depend on 'extensions'. Otherwise the > > question is, is it ok for other components, like runners, to do the > > same? Or even core? > > > > - I think there are few distinguishable classes of things in > > 'extensions' right now: > > - collections of `PTransforms` with some business logic (Sorter, > > Join, Sketch); > > - collections of `PTransforms` with focus parsing (Jackson, > Protobuf); > > - DSLs; SQL DSL with more than just a few `PTransforms`, it can be > > used almost as a standalone SDK. Things like Euphoria will probably end > > up in the same class; > > - utility libraries shared by some parts of the SDK and unclear if > > they are valuable by themselves to external users (Protobuf, GCP core); > > To me the business logic and parsing libraries do make sense to stay > > in extensions, but probably under different subdirectories. I think it > > will make sense to split others out of extensions into separate parts of > > the SDK. > > > > - I think we should add readme.md's with short descriptions and links > > to Beam website; > > > > Thoughts, comments? > > > > > > [1]: https://github.com/apache/beam/tree/master/sdks/java/extensions > > [2]: https://beam.apache.org/documentation/sdks/java-extensions/ >
