AlexisBRENON commented on code in PR #38071:
URL: https://github.com/apache/airflow/pull/38071#discussion_r1528599168


##########
contributing-docs/07_local_virtualenv.rst:
##########
@@ -315,7 +315,7 @@ When you install airflow from sources using editable 
install, you can develop to
 of Airflow and providers, which is pretty convenient, because you can use the 
same environment for both.
 
 
-Running ``pipinstall -e .`` will install Airflow in editable mode, but all 
provider code will also be
+Running ``pip install -e .`` will install Airflow in editable mode, but all 
provider code will also be

Review Comment:
   I will move it to another PR :+1: 



##########
airflow/providers/google/cloud/log/stackdriver_task_handler.py:
##########
@@ -99,7 +99,7 @@ def __init__(
         super().__init__()
         self.gcp_key_path: str | None = gcp_key_path
         self.scopes: Collection[str] | None = scopes
-        self.name: str = name
+        self.gcp_log_name: str = name

Review Comment:
   Sorry, maybe I forgot to add some context ^^
   So, Airflow relies on the 
[`logging.config.dictConfig`](https://docs.python.org/3/library/logging.config.html)
 method to [setup the logging 
stack](https://github.com/apache/airflow/blob/a58441ca1b263cae61a5bb653e6839f0dd29b08e/airflow/logging_config.py#L69).
   However, during this setup, it iterates through the handlers and [explicitly 
sets their 
name](https://github.com/python/cpython/blob/2a4cbf17af19a01d942f9579342f77c39fbd23c4/Lib/logging/config.py#L578):
   ```py
                   for name in sorted(handlers):
                       try:
                           handler = self.configure_handler(handlers[name])
                           handler.name = name
                           handlers[name] = handler
   ```
   
   So, before this fix:
   
   1. Setup the remote logging through the environment variables 
`AIRFLOW__LOGGING__REMOTE_LOGGING="true"` and 
`AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER="stackdriver://host/path"`
   2. Airflow instanciates a `StackdriverTaskHandler` with the name of `"path"`
   3. **BUT** the `dictConfig` call overrides the name of the handler with the 
key of the handlers configuration (i.e. 
[`task`](https://github.com/apache/airflow/blob/a58441ca1b263cae61a5bb653e6839f0dd29b08e/airflow/config_templates/airflow_local_settings.py#L350)).
   4. Hence, the next calls to the `emit` method of the handler will generate 
logs to the wrong destination (`task` instead of `path`).
   
   Changing the field, from `name` to `gcp_log_name` prevents the overriding 
from the `dictConfig`. For me, it is a bug fix (and not a change to prevent 
potential future bugs), but as you noted, it breaks the backward compatibility. 
Should we still pass through a deprecation period or should we update it ASAP ?



##########
docs/apache-airflow-providers-google/logging/stackdriver.rst:
##########
@@ -50,7 +50,7 @@ Turning this option off will result in data not being sent to 
Stackdriver.
 
 The ``remote_base_log_folder`` option contains the URL that specifies the type 
of handler to be used.
 For integration with Stackdriver, this option should start with 
``stackdriver://``.
-The path section of the URL specifies the name of the log e.g. 
``stackdriver://airflow-tasks`` writes
+The path section of the URL specifies the name of the log e.g. 
``stackdriver:///airflow-tasks`` writes

Review Comment:
   As you can see, the documentation states that "The path section of the URL 
specifies the name of the log". However, the URL `stackdriver://airflow-tasks` 
(with only two `/`) is parsed with an empty `path` component, resulting in the 
name of the log being an empty string (and so, fallbacks on the default value). 
With, the extra `/`, the URL is parsed as expected, using the `airflow-tasks` 
as the name of the log:
   
   ```py
   >>> from urllib.parse import urlsplit
   >>> urlsplit("stackdriver://airflow-tasks")
   SplitResult(scheme='stackdriver', netloc='airflow-tasks', path='', query='', 
fragment='')
   >>> urlsplit("stackdriver:///airflow-tasks")
   SplitResult(scheme='stackdriver', netloc='', path='/airflow-tasks', 
query='', fragment='')
   >>> 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to