kaxil edited a comment on issue #12983:
URL: https://github.com/apache/airflow/issues/12983#issuecomment-761819429


   >All such cross references are already automatically checked (via 
pre-commit) and stored in 
https://github.com/apache/airflow/blob/master/airflow/providers/dependencies.json
 .
   
   This is not the cross-reference we (at least me) are talking about. Google 
Providers for example reference xcom documentation from apache-airflow (core) 
docs. Example: 
https://github.com/apache/airflow/blob/master/docs/apache-airflow-providers-google/operators/marketing_platform/campaign_manager.rst
   
   And similarly apache-airflow (core) docs reference 
`~airflow.providers.http.operators.http.SimpleHttpOperator` and 
`~airflow.providers.sqlite.operators.sqlite.SqliteOperator`, 
`~airflow.providers.jdbc.hooks.jdbc.JdbcHook`
   
   So at the very least, we need to build docs:
   
   - `apache-airflow`
   - `apache-airflow-providers`
   - `apache-airflow-providers-PROVIDER_THAT_CHANGES`
   
   
   >Now I want us to consider what makes more sense:
   > 1. Optimizing #13706 - doc/*.rst documentation-only change
   > 2. Optimizing docs build per-provider - only one provider 
code+documentation changes
   
   (2) is not possible because of the reason I explained above. We need (3)  
Optimizing docs build **core** + **apache-airflow-providers** + per-provider - 
only one provider code+documentation changes + **core** docs + 
**apache-airflow-providers**
   
   
   >I'd argue changing only documentation is a bad smell. Documentation should 
usually be changed together with the code when code change happens. So I argue 
that those builds will be extremely rare in the future. And they will mostly 
not impact the people who are making 'substantial' changes - those who need 
fast feedback on their builds.
   
   Wrong, we encourage doc only changes for new contributors, where they can 
add a missing section, fix formatting issues, correct outdated information, 
definitely not a bad smell. Documentation has to be a first-class citizen and 
important for us as a project, more so because we are an OSS project. The 
history of doc only changes (based on a number of commits) is also large. And 
at least documentation will be always evolving (getting better) -- not only now 
because we released a major version.
   
   >We do not have to build "airflow" docs in this case - there should be "0" 
references from airflow to particular providers docs.
   I don't think so, it is completely find for just to reference for example a 
Slack provider to explain sla_miss_callback functionality or Secrets Backend.
   
   Example references:
   ```
   best-practices.rst:207:Similarly, if you have a task that starts a 
microservice in Kubernetes or Mesos, you should check if the service has 
started or not using :class:`airflow.providers.http.sensors.http.HttpSensor`.
   concepts.rst:434:by pre-installing some 
:doc:`apache-airflow-providers:index` packages (they are always available no
   concepts.rst:437:- 
:class:`~airflow.providers.http.operators.http.SimpleHttpOperator` - sends an 
HTTP request
   concepts.rst:438:- 
:class:`~airflow.providers.sqlite.operators.sqlite.SqliteOperator` - SQLite DB 
operator
   concepts.rst:443:additional packages manually (for example 
``apache-airflow-providers-mysql`` package).
   concepts.rst:447:- 
:class:`~airflow.providers.mysql.operators.mysql.MySqlOperator`
   concepts.rst:448:- 
:class:`~airflow.providers.postgres.operators.postgres.PostgresOperator`
   concepts.rst:449:- 
:class:`~airflow.providers.microsoft.mssql.operators.mssql.MsSqlOperator`
   concepts.rst:450:- 
:class:`~airflow.providers.oracle.operators.oracle.OracleOperator`
   concepts.rst:451:- 
:class:`~airflow.providers.jdbc.operators.jdbc.JdbcOperator`
   concepts.rst:452:- 
:class:`~airflow.providers.docker.operators.docker.DockerOperator`
   concepts.rst:453:- 
:class:`~airflow.providers.apache.hive.operators.hive.HiveOperator`
   concepts.rst:454:- 
:class:`~airflow.providers.amazon.aws.operators.s3_file_transform.S3FileTransformOperator`
   concepts.rst:455:- 
:class:`~airflow.providers.mysql.transfers.presto_to_mysql.PrestoToMySqlOperator`,
   concepts.rst:456:- 
:class:`~airflow.providers.slack.operators.slack.SlackAPIOperator`
   concepts.rst:459:at :doc:`apache-airflow-providers:index`.
   concepts.rst:795:``conn_id`` for the 
:class:`~airflow.providers.postgres.hooks.postgres.PostgresHook` is
   howto/connection.rst:332:can also add a custom provider that adds custom 
connection types. See :doc:`apache-airflow-providers:index`
   
howto/connection.rst:339::py:class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook`.
   howto/connection.rst:350:You can read more about details how to add custom 
provider packages in the :doc:`apache-airflow-providers:index`
   howto/custom-operator.rst:101:See :doc:`connection` for how to create and 
manage connections and :doc:`apache-airflow-providers:index` for
   howto/custom-operator.rst:268:is 
:class:`airflow.providers.google.cloud.sensors.gcs.GCSUploadSessionCompleteSensor`.
   howto/define_extra_link.rst:66:all the operators through an airflow plugin 
or through airflow providers. You can learn more about it in the
   howto/define_extra_link.rst:67::ref:`plugin example <plugin-example>` and in 
:doc:`apache-airflow-providers:index`.
   howto/define_extra_link.rst:77:tasks using 
:class:`~airflow.providers.amazon.aws.transfers.gcs_to_s3.GCSToS3Operator` 
operator.
   howto/define_extra_link.rst:86:  from 
airflow.providers.amazon.aws.transfers.gcs_to_s3 import GCSToS3Operator
   
howto/define_extra_link.rst:112::class:`~airflow.providers.google.cloud.operators.bigquery.BigQueryExecuteQueryOperator`
 includes a link to the Google Cloud
   howto/define_extra_link.rst:119:    from 
airflow.providers.google.cloud.operators.bigquery import BigQueryOperator
   howto/define_extra_link.rst:145:As explained in 
:doc:`apache-airflow-providers:index`, when you create your own Airflow 
Provider, you can
   howto/define_extra_link.rst:150:by ``apache-airflow-providers-google`` 
provider currently:
   howto/define_extra_link.rst:155:      - 
airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleLink
   howto/define_extra_link.rst:156:      - 
airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleIndexableLink
   howto/define_extra_link.rst:157:      - 
airflow.providers.google.cloud.operators.mlengine.AIPlatformConsoleLink
   howto/email-config.rst:76:      email_backend = 
airflow.providers.sendgrid.utils.emailer.send_email
   installation.rst:88:has a corresponding ``apache-airflow-providers-amazon`` 
providers package to be installed. When you install
   installation.rst:106:see: :doc:`apache-airflow-providers:index`
   installation.rst:108:For the list of the provider packages and what they 
enable, see: :doc:`apache-airflow-providers:packages-ref`.
   modules_management.rst:272:    apache-airflow-providers-amazon           | 
1.0.0b2
   modules_management.rst:273:    apache-airflow-providers-apache-cassandra | 
1.0.0b2
   modules_management.rst:274:    apache-airflow-providers-apache-druid     | 
1.0.0b2
   modules_management.rst:275:    apache-airflow-providers-apache-hdfs      | 
1.0.0b2
   modules_management.rst:276:    apache-airflow-providers-apache-hive      | 
1.0.0b2
   
operators-and-hooks-ref.rst:23::doc:`apache-airflow-providers:operators-and-hooks-ref/index`.
   plugins.rst:169:    from airflow.providers.amazon.aws.transfers.gcs_to_s3 
import GCSToS3Operator
   production-deployment.rst:762:Some operators, such as 
:class:`airflow.providers.google.cloud.operators.kubernetes_engine.GKEStartPodOperator`,
   
production-deployment.rst:763::class:`airflow.providers.google.cloud.operators.dataflow.DataflowStartSqlJobOperator`,
 require
   production-deployment.rst:869:If you want to establish an SSH connection to 
the Compute Engine instance, you must have the network address of this instance 
and credentials to access it. To simplify this task, you can use 
:class:`~airflow.providers.google.cloud.hooks.compute.ComputeEngineHook` 
instead of :class:`~airflow.providers.ssh.hooks.ssh.SSHHook`
   production-deployment.rst:871:The 
:class:`~airflow.providers.google.cloud.hooks.compute.ComputeEngineHook` 
support authorization with Google OS Login service. It is an extremely robust 
way to manage Linux access properly as it stores short-lived ssh keys in the 
metadata service, offers PAM modules for access and sudo privilege checking and 
offers nsswitch user lookup into the metadata service as well.
   security/secrets/secrets-backend/index.rst:69:    
airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
   upgrading-to-2.rst:117:    from airflow.providers.docker.operators.docker 
import DockerOperator
   upgrading-to-2.rst:126:automatically installs the 
``apache-airflow-providers-docker`` package.
   upgrading-to-2.rst:131:You can read more about providers at 
:doc:`apache-airflow-providers:index`.
   upgrading-to-2.rst:507:  * You can read more about providers at 
:doc:`apache-airflow-providers:index`.
   ```
   
   >I firmly believe we should optimize those things that have higher impact 
and bring more benefits. I hate doing micro-optimisations, I always think 
"long-term"/"high impact" when I am doing it.
   
   We should not fall in the trap of over-optimisation though. We can still get 
a good reduction in time (that will result in more benefits and hopefully 
higher impact) as we won't be building docs for all providers (only 
`apache-airflow`, `apache-airflow-providers` and 
`apache-airflow-provder-PROVIDER_THAT_CHANGED`.) 
   
   >WDYT @kaxil @mik-laj which of those are worth it? (1) or (2)? I argue 
(above) that (2) has much higher impact and brings more benefits to the 
community as whole. If you think we should do (1) rather than (2), I'd love to 
hear your line of thoughts an reasoning why you think it is better to do it.
   
   As I said above, I would vouch for (1) and (3) not (2) or just (3) where it 
will supersede (1) too.
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to