We should probably add API Auth Backends too? On Sun, Aug 15, 2021 at 6:59 PM Jarek Potiuk <[email protected]> wrote:
> Preparatory PR here: https://github.com/apache/airflow/pull/17625 - this > way we get a list of all secrets/logging handlers in provider.yaml and we > can use them to generate the doc (and provider info will show them too). > > On Sun, Aug 15, 2021 at 6:00 PM Kaxil Naik <[email protected]> wrote: > >> 100% agree with Kamil -- They are fundamentally separate and can get out >> of date as they are published separately. >> >> Kamil's proposal looks good to me. >> >> On Sun, Aug 15, 2021 at 12:52 AM .... <[email protected]> wrote: >> >>> I understand the user's perspective and that it is currently difficult >>> to discover the list of backend secrets/task handlers that are >>> distributed in providers packages. I just want to point out that >>> including this list directly in the apache-airflow documentation >>> package will have consequences. I would prefer to explain the >>> difference between the two types of integration and redirect the user >>> to another page where they can get detailed information. >>> >>> There are a few problems that I can see from putting this listing >>> directly on this page: >>> 1. The apache-airflow has a different publishing cycle than the >>> provider packages, so it will be out of date. >>> 2. Packages for the old version of apache-airflow will contain >>> information on the integration set that is known only at the time of >>> the release of that version. We can release integrations that will >>> still be compatible, but will not be known at the time of the release >>> of the apache-airflow version. >>> 3. We do not have * .py files on the v2-*-test branch, so we cannot >>> verify that the documentation is correct. >>> 4. We mix two types of documentation - guides and references. This can >>> make this page difficult to understand as well as find it. >>> >>> What I am thinking really is to this kind of formula (It shows how >>> secrets should look like but it should be applied to task handlers in >>> similar cases): >>> >>> apache-airflow/security/secrets/secrets-backend/index.rst >>> ############################################ >>> >>> Secret Backends: >>> ============= >>> >>> <Paragraph Describe Secret backends in general> >>> >>> # Available Secret backends >>> >>> Airflow has a built-in backend, but most of the secrets are >>> distributed independently of it. That means you need to install it >>> separately, but it's very easy with a pip. This also means that you >>> can update the secret backend independently of the Airflow core, or >>> use the secret backend that was released after this Airflow version >>> was released. >>> >>> ## Core Airflow Secret backends: >>> * <File backend> - link pointing to it >>> >>> ## Backends Provided by community-managed providers: >>> >>> The list of secrets backend managed by the community is available In >>> providers packages documentation: :doc:`Secret backend reference >>> <apache-airflow-providers: providers>`__ >>> >>> >>> ########################################## >>> >>> apache-airflow-providers/secrets-backend-ref.rst >>> ############################################ >>> >>> Secret backends refernece >>> ===================== >>> >>> Here’s the list of the secret backends which are available in this >>> release in providers packages. For general information on Secret >>> backend, or build-in secret backend, see: <LINK TO SECRET BACKEND> >>> >>> * <VaultBackend> >>> * <AWSSecretBackend> >>> * <KMSBackend> >>> >>> ########################################### >>> >>> The existing page describing the operators is similar to my proposal, >>> so you can see it in the wild >>> >>> http://airflow.apache.org/docs/apache-airflow/stable/concepts/operators.html#operators >>> >>> sob., 14 sie 2021 o 19:19 Jarek Potiuk <[email protected]> napisał(a): >>> > >>> > > I am concerned about adding information about the content of provider >>> > packages in the core documentation as it is very easy to get obsolete >>> > >>> > I agree we should not put any provider details in "core" . But we >>> should at the very least (I think) put links to all the "community" >>> providers that implement certain features. >>> > >>> > This is really a "discoverability" problem, nothing more. I think we - >>> long term committers who know all about airflow, providers, etc. are >>> overestimating user's knowledge about airflow internals - and the >>> documentation should be there to guide them to learn. >>> > There was this - very relevant - comic from XKCD day before yesterday >>> https://xkcd.com/2501/ that shows the mechanism very well. >>> > >>> > I tried to put myself in the shoes of a new user. Try to do it Kamil >>> as well. >>> > >>> > When you look at the "logging" or "secrets" section, you are >>> completely unaware that you can get AWS, GCP and other integrations >>> provided by the community. And there is NOTHING to tell you otherwise. You >>> need to know that you should start looking elsewhere - and I want to help >>> the people who are looking at the page to give the links where they can >>> find itt. >>> > Essentially when you do not airflow, do not realise that there are >>> providers, and do not realise that those providers implement those issues, >>> You leave with the impression that a lot of stuff is missing. >>> > >>> > With the current documentation structure, I am afraid People simply do >>> not even know that there are community-managed implementations out there. >>> > >>> > What I am thinking really is to this kind of formula (It shows how >>> secrets should look like but it should be applied across the board in >>> similar cases): >>> > >>> > >>> > ############################################ >>> > >>> > Secret Backends Page: >>> > >>> > Paragraph Describe Secret backends in general >>> > >>> > # Available Secret backends >>> > >>> > ## Core Airflow Secret backends: >>> > * <File backend> - link pointing to it >>> > >>> > ## Backends Provided by community-managed providers: >>> > * <VaultBackend> >>> > * <AWSSecretBackend> >>> > * <KMSBackend> >>> > >>> > ########################################## >>> > >>> > I think about just links to the appropriate documentation available in >>> providers. No more, no less. This could be applied (automatically) to all >>> functionalities provided by providers. >>> > I think this is safe, can be automated and solves the discoverability >>> problem. It does not require extra maintenance. >>> > >>> > >>> > J. >>> > >>> > >>> > >>> > >>> > On Sat, Aug 14, 2021 at 6:40 PM .... <[email protected]> wrote: >>> >> >>> >> Commented above >>> >> >>> >> pt., 13 sie 2021 o 03:48 Jarek Potiuk <[email protected]> napisał(a): >>> >> > >>> >> >>> >> > * List (and link) available logging options at >>> https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html?highlight=remote%20log#advanced-configuration >>> .You will not find list of implemented integrations in this page - you >>> should look for details of advanced logging in providers (but it's not at >>> all obvious where and that they exist at all). There are no links to S3/GCS >>> logging configuration/handling and it's not easy to find out where you >>> should look for them. Better examples would also be useful. >>> >> > >>> >> > * Secret Backends page is a bit better - >>> https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html. >>> At least it mentions GCP/Hashicorp as "examples" but it misses AWS one and >>> when you go to "Supported Backends" you see only the "Local Filesystem"one. >>> I think it is really misleading that you do not have a full list of secret >>> backends in the community-managed providers. >>> >> > >>> >> >>> >> I am concerned about adding information about the content of provider >>> >> packages in the core documentation as it is very easy to get obsolete >>> >> as Airflow and the packages have a different release cycle and the new >>> >> packages are compatible with the old Airflow versions so it may not be >>> >> obvious that you should be looking at the latest documentation for >>> >> Airflow to know the full list of providers even if you are using a >>> >> non-latest version of Airflow. >>> >> >>> >> I think it's worth taking an approach similar to operators, where the >>> >> core documentation does not contain the full list of operators from >>> >> the provider packages, but only contains a list of operators in the >>> >> core, and includes references to the documentation for providers that >>> >> includes this list of operators in provider packages. >>> >> Here is a reference of all core operators: >>> >> >>> https://airflow.apache.org/docs/apache-airflow/stable/operators-and-hooks-ref.html >>> >> Here is a reference of all operators in providers packages: >>> >> >>> https://airflow.apache.org/docs/apache-airflow-providers/operators-and-hooks-ref/index.html >>> >> >>> >> The list of operators in the providers' package is automatically >>> >> generated on the basis of provider.yaml files and the correctness of >>> >> the file are automatically verified, so we can be sure that the >>> >> reference is up-to-date and complete. This also reduces the >>> >> maintenance burden of this documentation. >>> >> >>> >> Adding the backend and task handler secret to providers.yaml also >>> >> means that information about them will be available on the main page >>> >> of the project in the "Integrations" section. >>> > >>> > >>> > >>> > -- >>> > +48 660 796 129 >>> >> > > -- > +48 660 796 129 >
