AlexisBRENON commented on code in PR #38071:
URL: https://github.com/apache/airflow/pull/38071#discussion_r1528599168
##########
contributing-docs/07_local_virtualenv.rst:
##########
@@ -315,7 +315,7 @@ When you install airflow from sources using editable
install, you can develop to
of Airflow and providers, which is pretty convenient, because you can use the
same environment for both.
-Running ``pipinstall -e .`` will install Airflow in editable mode, but all
provider code will also be
+Running ``pip install -e .`` will install Airflow in editable mode, but all
provider code will also be
Review Comment:
I will move it to another PR :+1:
##########
airflow/providers/google/cloud/log/stackdriver_task_handler.py:
##########
@@ -99,7 +99,7 @@ def __init__(
super().__init__()
self.gcp_key_path: str | None = gcp_key_path
self.scopes: Collection[str] | None = scopes
- self.name: str = name
+ self.gcp_log_name: str = name
Review Comment:
Sorry, maybe I forgot to add some context ^^
So, Airflow relies on the
[`logging.config.dictConfig`](https://docs.python.org/3/library/logging.config.html)
method to [setup the logging
stack](https://github.com/apache/airflow/blob/a58441ca1b263cae61a5bb653e6839f0dd29b08e/airflow/logging_config.py#L69).
However, during this setup, it iterates through the handlers and [explicitly
sets their
name](https://github.com/python/cpython/blob/2a4cbf17af19a01d942f9579342f77c39fbd23c4/Lib/logging/config.py#L578):
```py
for name in sorted(handlers):
try:
handler = self.configure_handler(handlers[name])
handler.name = name
handlers[name] = handler
```
So, before this fix:
1. Setup the remote logging through the environment variables
`AIRFLOW__LOGGING__REMOTE_LOGGING="true"` and
`AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER="stackdriver://host/path"`
2. Airflow instanciates a `StackdriverTaskHandler` with the name of `"path"`
3. **BUT** the `dictConfig` call overrides the name of the handler with the
key of the handlers configuration (i.e.
[`task`](https://github.com/apache/airflow/blob/a58441ca1b263cae61a5bb653e6839f0dd29b08e/airflow/config_templates/airflow_local_settings.py#L350)).
4. Hence, the next calls to the `emit` method of the handler will generate
logs to the wrong destination (`task` instead of `path`).
Changing the field, from `name` to `gcp_log_name` prevents the overriding
from the `dictConfig`. For me, it is a bug fix (and not a change to prevent
potential future bugs), but as you noted, it breaks the backward compatibility.
Should we still pass through a deprecation period or should we update it ASAP ?
##########
docs/apache-airflow-providers-google/logging/stackdriver.rst:
##########
@@ -50,7 +50,7 @@ Turning this option off will result in data not being sent to
Stackdriver.
The ``remote_base_log_folder`` option contains the URL that specifies the type
of handler to be used.
For integration with Stackdriver, this option should start with
``stackdriver://``.
-The path section of the URL specifies the name of the log e.g.
``stackdriver://airflow-tasks`` writes
+The path section of the URL specifies the name of the log e.g.
``stackdriver:///airflow-tasks`` writes
Review Comment:
As you can see, the documentation states that "The path section of the URL
specifies the name of the log". However, the URL `stackdriver://airflow-tasks`
(with only two `/`) is parsed with an empty `path` component, resulting in the
name of the log being an empty string (and so, fallbacks on the default value).
With, the extra `/`, the URL is parsed as expected, using the `airflow-tasks`
as the name of the log:
```py
>>> from urllib.parse import urlsplit
>>> urlsplit("stackdriver://airflow-tasks")
SplitResult(scheme='stackdriver', netloc='airflow-tasks', path='', query='',
fragment='')
>>> urlsplit("stackdriver:///airflow-tasks")
SplitResult(scheme='stackdriver', netloc='', path='/airflow-tasks',
query='', fragment='')
>>>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]