Preparatory PR here: https://github.com/apache/airflow/pull/17625 - this way we get a list of all secrets/logging handlers in provider.yaml and we can use them to generate the doc (and provider info will show them too).
On Sun, Aug 15, 2021 at 6:00 PM Kaxil Naik <[email protected]> wrote: > 100% agree with Kamil -- They are fundamentally separate and can get out > of date as they are published separately. > > Kamil's proposal looks good to me. > > On Sun, Aug 15, 2021 at 12:52 AM .... <[email protected]> wrote: > >> I understand the user's perspective and that it is currently difficult >> to discover the list of backend secrets/task handlers that are >> distributed in providers packages. I just want to point out that >> including this list directly in the apache-airflow documentation >> package will have consequences. I would prefer to explain the >> difference between the two types of integration and redirect the user >> to another page where they can get detailed information. >> >> There are a few problems that I can see from putting this listing >> directly on this page: >> 1. The apache-airflow has a different publishing cycle than the >> provider packages, so it will be out of date. >> 2. Packages for the old version of apache-airflow will contain >> information on the integration set that is known only at the time of >> the release of that version. We can release integrations that will >> still be compatible, but will not be known at the time of the release >> of the apache-airflow version. >> 3. We do not have * .py files on the v2-*-test branch, so we cannot >> verify that the documentation is correct. >> 4. We mix two types of documentation - guides and references. This can >> make this page difficult to understand as well as find it. >> >> What I am thinking really is to this kind of formula (It shows how >> secrets should look like but it should be applied to task handlers in >> similar cases): >> >> apache-airflow/security/secrets/secrets-backend/index.rst >> ############################################ >> >> Secret Backends: >> ============= >> >> <Paragraph Describe Secret backends in general> >> >> # Available Secret backends >> >> Airflow has a built-in backend, but most of the secrets are >> distributed independently of it. That means you need to install it >> separately, but it's very easy with a pip. This also means that you >> can update the secret backend independently of the Airflow core, or >> use the secret backend that was released after this Airflow version >> was released. >> >> ## Core Airflow Secret backends: >> * <File backend> - link pointing to it >> >> ## Backends Provided by community-managed providers: >> >> The list of secrets backend managed by the community is available In >> providers packages documentation: :doc:`Secret backend reference >> <apache-airflow-providers: providers>`__ >> >> >> ########################################## >> >> apache-airflow-providers/secrets-backend-ref.rst >> ############################################ >> >> Secret backends refernece >> ===================== >> >> Here’s the list of the secret backends which are available in this >> release in providers packages. For general information on Secret >> backend, or build-in secret backend, see: <LINK TO SECRET BACKEND> >> >> * <VaultBackend> >> * <AWSSecretBackend> >> * <KMSBackend> >> >> ########################################### >> >> The existing page describing the operators is similar to my proposal, >> so you can see it in the wild >> >> http://airflow.apache.org/docs/apache-airflow/stable/concepts/operators.html#operators >> >> sob., 14 sie 2021 o 19:19 Jarek Potiuk <[email protected]> napisał(a): >> > >> > > I am concerned about adding information about the content of provider >> > packages in the core documentation as it is very easy to get obsolete >> > >> > I agree we should not put any provider details in "core" . But we >> should at the very least (I think) put links to all the "community" >> providers that implement certain features. >> > >> > This is really a "discoverability" problem, nothing more. I think we - >> long term committers who know all about airflow, providers, etc. are >> overestimating user's knowledge about airflow internals - and the >> documentation should be there to guide them to learn. >> > There was this - very relevant - comic from XKCD day before yesterday >> https://xkcd.com/2501/ that shows the mechanism very well. >> > >> > I tried to put myself in the shoes of a new user. Try to do it Kamil as >> well. >> > >> > When you look at the "logging" or "secrets" section, you are completely >> unaware that you can get AWS, GCP and other integrations provided by the >> community. And there is NOTHING to tell you otherwise. You need to know >> that you should start looking elsewhere - and I want to help the people who >> are looking at the page to give the links where they can find itt. >> > Essentially when you do not airflow, do not realise that there are >> providers, and do not realise that those providers implement those issues, >> You leave with the impression that a lot of stuff is missing. >> > >> > With the current documentation structure, I am afraid People simply do >> not even know that there are community-managed implementations out there. >> > >> > What I am thinking really is to this kind of formula (It shows how >> secrets should look like but it should be applied across the board in >> similar cases): >> > >> > >> > ############################################ >> > >> > Secret Backends Page: >> > >> > Paragraph Describe Secret backends in general >> > >> > # Available Secret backends >> > >> > ## Core Airflow Secret backends: >> > * <File backend> - link pointing to it >> > >> > ## Backends Provided by community-managed providers: >> > * <VaultBackend> >> > * <AWSSecretBackend> >> > * <KMSBackend> >> > >> > ########################################## >> > >> > I think about just links to the appropriate documentation available in >> providers. No more, no less. This could be applied (automatically) to all >> functionalities provided by providers. >> > I think this is safe, can be automated and solves the discoverability >> problem. It does not require extra maintenance. >> > >> > >> > J. >> > >> > >> > >> > >> > On Sat, Aug 14, 2021 at 6:40 PM .... <[email protected]> wrote: >> >> >> >> Commented above >> >> >> >> pt., 13 sie 2021 o 03:48 Jarek Potiuk <[email protected]> napisał(a): >> >> > >> >> >> >> > * List (and link) available logging options at >> https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html?highlight=remote%20log#advanced-configuration >> .You will not find list of implemented integrations in this page - you >> should look for details of advanced logging in providers (but it's not at >> all obvious where and that they exist at all). There are no links to S3/GCS >> logging configuration/handling and it's not easy to find out where you >> should look for them. Better examples would also be useful. >> >> > >> >> > * Secret Backends page is a bit better - >> https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html. >> At least it mentions GCP/Hashicorp as "examples" but it misses AWS one and >> when you go to "Supported Backends" you see only the "Local Filesystem"one. >> I think it is really misleading that you do not have a full list of secret >> backends in the community-managed providers. >> >> > >> >> >> >> I am concerned about adding information about the content of provider >> >> packages in the core documentation as it is very easy to get obsolete >> >> as Airflow and the packages have a different release cycle and the new >> >> packages are compatible with the old Airflow versions so it may not be >> >> obvious that you should be looking at the latest documentation for >> >> Airflow to know the full list of providers even if you are using a >> >> non-latest version of Airflow. >> >> >> >> I think it's worth taking an approach similar to operators, where the >> >> core documentation does not contain the full list of operators from >> >> the provider packages, but only contains a list of operators in the >> >> core, and includes references to the documentation for providers that >> >> includes this list of operators in provider packages. >> >> Here is a reference of all core operators: >> >> >> https://airflow.apache.org/docs/apache-airflow/stable/operators-and-hooks-ref.html >> >> Here is a reference of all operators in providers packages: >> >> >> https://airflow.apache.org/docs/apache-airflow-providers/operators-and-hooks-ref/index.html >> >> >> >> The list of operators in the providers' package is automatically >> >> generated on the basis of provider.yaml files and the correctness of >> >> the file are automatically verified, so we can be sure that the >> >> reference is up-to-date and complete. This also reduces the >> >> maintenance burden of this documentation. >> >> >> >> Adding the backend and task handler secret to providers.yaml also >> >> means that information about them will be available on the main page >> >> of the project in the "Integrations" section. >> > >> > >> > >> > -- >> > +48 660 796 129 >> > -- +48 660 796 129
