VladaZakharova commented on code in PR #66342: URL: https://github.com/apache/airflow/pull/66342#discussion_r3257007947
########## providers/openlineage/src/airflow/providers/openlineage/token_provider.py: ########## @@ -0,0 +1,126 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from __future__ import annotations + +from typing import Any + +from airflow.providers.common.compat.sdk import AirflowException, BaseHook + +AIRFLOW_CONNECTION_API_KEY_AUTH_TYPE = "airflow_connection_api_key" +OPENLINEAGE_CONFIG_EXTRA_KEY = "openlineage_config" +_DEFAULT_EXTRA_KEYS = ("apiKey", "api_key", "apikey", "token", "access_token") + + +class OpenLineageAirflowConnectionAuthError(AirflowException): + """Raised when OpenLineage API key auth cannot be resolved from an Airflow connection.""" + + +class OpenLineageAirflowConnectionConfigError(AirflowException): + """Raised when OpenLineage config cannot be resolved from an Airflow connection.""" Review Comment: I agree that a dedicated openlineage connection type would be nicer for users. I’m just not sure we should add it in this PR, because it feels like a separate feature from loading the config from a connection. For now I updated the docs and provider metadata to say that config_conn_id should point to a Generic Airflow connection. That at least gives users a clear choice instead of “pick any connection type”. We can add a proper OpenLineage connection type later in a separate PR. WDYT? ########## providers/openlineage/src/airflow/providers/openlineage/token_provider.py: ########## @@ -0,0 +1,126 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from __future__ import annotations + +from typing import Any + +from airflow.providers.common.compat.sdk import AirflowException, BaseHook + +AIRFLOW_CONNECTION_API_KEY_AUTH_TYPE = "airflow_connection_api_key" +OPENLINEAGE_CONFIG_EXTRA_KEY = "openlineage_config" +_DEFAULT_EXTRA_KEYS = ("apiKey", "api_key", "apikey", "token", "access_token") + + +class OpenLineageAirflowConnectionAuthError(AirflowException): + """Raised when OpenLineage API key auth cannot be resolved from an Airflow connection.""" + + +class OpenLineageAirflowConnectionConfigError(AirflowException): + """Raised when OpenLineage config cannot be resolved from an Airflow connection.""" + + +class AirflowConnectionConfigProvider: + """ + Resolve OpenLineage client configuration from an Airflow connection. + + The connection extra can contain the full OpenLineage client config, for example + ``{"transport": {"type": "console"}}``. For convenience, it can also contain only the transport + config, for example ``{"type": "console"}``. + """ + + def __init__(self, conn_id: str) -> None: + if not conn_id: + raise OpenLineageAirflowConnectionConfigError( + "OpenLineage connection config requires a non-empty connection ID." + ) + self.conn_id = conn_id + + def get_config(self) -> dict[str, Any]: + connection = BaseHook.get_connection(self.conn_id) + extra = connection.extra_dejson + config = self._get_config_from_extra(extra) + if config is not None: + return config + + raise OpenLineageAirflowConnectionConfigError( + "OpenLineage connection config could not find configuration in connection " + f"`{self.conn_id}`. Expected full OpenLineage config or transport config in connection extra." + ) + + def _get_config_from_extra(self, extra: dict[str, Any]) -> dict[str, Any] | None: + if OPENLINEAGE_CONFIG_EXTRA_KEY in extra: + return self._validate_config(extra[OPENLINEAGE_CONFIG_EXTRA_KEY]) + + if "transport" in extra: + return self._validate_config(extra) + + if "type" in extra: + return {"transport": extra} + + return None + + def _validate_config(self, config: Any) -> dict[str, Any]: Review Comment: I thought about using OpenLineageClient(config=...) for this, but I think it would be a bit too heavy for validation here. It would create the client/transport once just to check the config, and then we would create it again later in the adapter. So for now I kept this check very small: the Airflow connection extra must be a JSON object with a transport object. The OpenLineage client still does the real transport/auth validation when it is created. If the OpenLineage client gets a dedicated validation method later, we can switch to that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
