Hi dev@, *TL;DR:* `sdks/java/extensions` is hard to discover, navigate and understand.
*Current State:* I was looking at `sdks/java/extensions`[1] and realized that I don't know what half of those things are. Only `join library` and `sorter` seem to be documented and discoverable on Beam website, under SDKs section [2]. Here's the list of all extensions with my questions/comments: - *google-cloud-platform-core*. What is this? Is this used in GCP IOs? If so, is `extensions` the right place for it? If it is, then why is it a `-core` extension? It feels like it's a utility package, not an extension; - *jackson*. I can guess what it is but we should document it somewhere; - *join-library*. It is documented, but I think we should add more documentation to explain how it works, maybe some caveats, and link to/from the `CoGBK` section of the doc; - *protobuf*. I can probably guess what is it. Is 'extensions' the right place for it though? We use this library in IOs (`PubsubsIO.readProtos()`), should we move it to IO then? Same as with GCP extension, feels like a utility library, not an extension; - *sketching*. No idea what to expect from this without reading the code; - *sorter*. Documented on the website; - *sql*. This looks familiar :) It is documented but not linked from the extensions section, it's unclear whether it's the whole SQL or just some related components; [1]: https://github.com/apache/beam/tree/master/sdks/java/extensions [2]: https://beam.apache.org/documentation/sdks/java-extensions/ *Questions:* - should we minimally document (at least describe) all extensions and add at least short readme.md's with the links to the Beam website? - is it a right thing to depend on `extensions` in other components like IOs? - would it make sense to move some things out of 'extensions'? E.g. IO components to IO or utility package, SQL into new DSLs package; *Opinion:* Maybe I am misunderstanding the intent and meaning of 'extensions', but from my perspective: - I think that extensions should be more or less isolated from the Beam SDK itself, so that if you delete or modify them, no Beam-internal changes will be required (changes to something that's not an extension). And my feeling is that they should provide value by themselves to users other than SDK authors. They are called 'extensions', not 'critical components' or 'sdk utilities'; - I don't think that IOs should depend on 'extensions'. Otherwise the question is, is it ok for other components, like runners, to do the same? Or even core? - I think there are few distinguishable classes of things in 'extensions' right now: - collections of `PTransforms` with some business logic (Sorter, Join, Sketch); - collections of `PTransforms` with focus parsing (Jackson, Protobuf); - DSLs; SQL DSL with more than just a few `PTransforms`, it can be used almost as a standalone SDK. Things like Euphoria will probably end up in the same class; - utility libraries shared by some parts of the SDK and unclear if they are valuable by themselves to external users (Protobuf, GCP core); To me the business logic and parsing libraries do make sense to stay in extensions, but probably under different subdirectories. I think it will make sense to split others out of extensions into separate parts of the SDK. - I think we should add readme.md's with short descriptions and links to Beam website; Thoughts, comments? [1]: https://github.com/apache/beam/tree/master/sdks/java/extensions [2]: https://beam.apache.org/documentation/sdks/java-extensions/
