jscheffl commented on code in PR #39510:
URL: https://github.com/apache/airflow/pull/39510#discussion_r1595964547


##########
airflow/config_templates/config.yml:
##########
@@ -2591,3 +2591,23 @@ sensors:
       type: float
       example: ~
       default: "604800"
+scarf_analytics:
+  description: |
+    Airflow integrates `Scarf <https://about.scarf.sh/>`__ to collect basic 
telemetry data during operation.
+    This data assists Airflow maintainers in better understanding how Airflow 
is used.
+    Insights gained from this telemetry are critical for prioritizing patches, 
minor releases, and
+    security fixes. Additionally, this information supports key decisions 
related to the development roadmap.

Review Comment:
   Same like in `docs/apache-airflow/faq.rst` you are not listing _what_ is 
going to be collected. In regards of transparency I'd propose we name it here. 
Otherwise if you fear redundancy, I'd propose to point to docs of FAQ in 
`docs/apache-airflow/faq.rst`



##########
airflow/www/views.py:
##########
@@ -1034,6 +1035,30 @@ def _iter_parsed_moved_data_table_names():
                     "warning",
                 )
 
+        scarf_url = ""
+        if settings.IS_SCARF_ANALYTICS_ENABLED:
+            scarf_domain = "https://apacheairflow.gateway.scarf.sh";
+
+            python_version = platform.python_version()
+            platform_sys = platform.system()
+            platform_arch = platform.machine()
+            db_name = settings.engine.dialect.name
+            db_version = settings.engine.dialect.server_version_info
+            if db_version:
+                # Example: (1, 2, 3) -> "1.2.3"
+                db_version = ".".join(map(str, db_version))
+            executor = conf.get("core", "EXECUTOR")
+
+            # Path Format:
+            # 
/{version}/{python_version}/{platform}/{arch}/{database}/{db_version}/{executor}/{num_dags}
+            #
+            # This path redirects to a Pixel tracking URL
+            scarf_url = (
+                f"{scarf_domain}/webserver"
+                f"/{version}/{python_version}"
+                
f"/{platform_sys}/{platform_arch}/{db_name}/{db_version}/{executor}/{all_dags_count}"
+            )

Review Comment:
   There lines seem to be very redundant to the scheduler helper function in 
`airflow/utils/scarf.py` - I'd propose to make the utility generic and use it 
in both places to prevent redundant distributed logic.



##########
airflow/config_templates/config.yml:
##########
@@ -2591,3 +2591,23 @@ sensors:
       type: float
       example: ~
       default: "604800"
+scarf_analytics:

Review Comment:
   The property is very much directly referring to scarf. I know we decided to 
use this (at the moment) but it might be that we change in future. Therefore 
I'd propose not to bind the config entry to the backend service/product name. I 
propose to rename it with a neutral scope, e.g.:
   ```suggestion
   analytics_collection:
   ```
   + of course other references in code need to be adjusted.



##########
docs/apache-airflow/faq.rst:
##########
@@ -522,3 +522,15 @@ This means ``explicit_defaults_for_timestamp`` is disabled 
in your mysql server
 
 #. Set ``explicit_defaults_for_timestamp = 1`` under the ``mysqld`` section in 
your ``my.cnf`` file.
 #. Restart the Mysql server.
+
+Does Airflow collect any telemetry data?
+----------------------------------------
+
+Airflow integrates `Scarf <https://about.scarf.sh/>`__ to collect basic 
telemetry data during operation.
+This data assists Airflow maintainers in better understanding how Airflow is 
used.
+Insights gained from this telemetry are critical for prioritizing patches, 
minor releases, and
+security fixes. Additionally, this information supports key decisions related 
to the development roadmap.
+
+Users can easily opt-out of analytics in various ways documented
+`here <https://docs.scarf.sh/gateway/#do-not-track>`__ and by setting the 
:ref:`config:scarf_analytics__enabled` option
+to ``False``. Airflow also respects the ``SCARF_ANALYTICS=false`` environment 
variable.

Review Comment:
   Can you please add a list of what is collected as metrics in current scope? 
Else users would need to take a look to codebase. Would be also very good for 
transparency to list it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to