Thanks for raising this. I agree our docs follow our implementation too closely (which in turn has grown organically and is not straightforward).
There were some related discussions of trying to organize modules (or, possibly, creating a new hierarchy that would selectively import things) in a more discoverable way for better discoverability. I wonder if this would lend itself to better documentations well. On Fri, Apr 30, 2021 at 4:46 PM Stephan Hoyer <[email protected]> wrote: > > (Note: I also filed this as a JIRA [1] a few days ago, but I noticed that the > mailing list seems to be a better place for opening discussions.) > > I've been enjoying diving into Beam recently, but to my frustration I've > found that I often need to look through the source code to discover APIs. > > Beam has some really nice documentation on its website (I particularly love > the "transform catalog") but I find the "API docs" [1] to be nearly unusable, > at least for the Python SDK. For example, try clicking on any of the > sub-headings, e.g., apache_beam.io [3]. It's a long, heavily nested listing > of the raw internal structure of Beam's python modules. > > To enumerate my concerns: > 1. It's hard to navigate. I need to know exactly where a function is defined > to find it. E.g., to find beam.Map, I had to click on "apache_beam.transforms > package" followed by "apache_beam.transforms.core module" and then scroll > down or search in the page for "Map." > 2. It isn't clear exactly which components are public APIs. The documentation > for a few modules notes that they are not public, but there are so many > others listed that I'm sure they cannot all be intended for public support. > This makes it hard to find Beam's main public APIs. > 3. It isn't clear the preferred import paths to use. For example, > apache_beam.Map is documented as apache_beam.transforms.core.Map, without > mention of the shorter name. > > I suspect the source of most of these issues is that the API docs make heavy > use of Sphinx's autodoc for modules. In my experience maintaining Python > projects, this just doesn't work very well. autosummary and autofunction on > individual functions/classes work well, but it needs to be organized by hand > – you can't count on automodule to do a good job of high level organization. > JAX's docs are a good example, e.g., see the source code [4] and rendered > HTML [5]. > > This would definitely be a bit of work, but is relatively straightforward to > set-up and I think would pay big dividends for discoverability of Beam's API. > I've gone through this process a few times for different projects, so I would > be happy to advise if/as issues come up. > > Cheers, > Stephan > > [1] https://issues.apache.org/jira/browse/BEAM-12235 > [2] https://beam.apache.org/releases/pydoc/2.28.0/index.html > [3] https://beam.apache.org/releases/pydoc/2.28.0/apache_beam.io.html > [4] https://github.com/google/jax/blob/master/docs/jax.rst > [5] https://jax.readthedocs.io/en/latest/jax.html
