dabla opened a new issue, #36732:
URL: https://github.com/apache/airflow/issues/36732
### Apache Airflow Provider(s)
http
### Versions of Apache Airflow Providers
2.7.3
### Apache Airflow version
2.7.3
### Operating System
RedHat
### Deployment
Other 3rd-party Helm chart
### Deployment details
_No response_
### What happened
As we are using Airflow behind a corporate HTTP proxy, we need to pass
dedicated proxy definitions for each HTTP call depending on which endpoint we
want to invoke as those have different permissions. For the ease of use, we
define the proxy definition as a dict within the extra_options field of the
HTTP Connection managed by Airflow. Unfortunately, the HttpHook assumes that
all values defined in the extra_options fields are HTTP headers which is not
the case. Following parameters are only useful for the SimpleHttpOperator
itself but not for the HttpHook:
- proxies
- stream
- verify
- cert
- max_redirects
As in our case the proxies parameter contains a dict, when the HttpHook
tries to add the proxies as a HTTP header we get an InvalidHeader exception as
a the value must be of type string or bytes:
raise InvalidHeader( requests.exceptions.InvalidHeader: Header part
({'http': 'http://proxy', 'https': 'http://proxy'}) from ('proxies', {'http':
'http://proxy', 'https': 'http://proxy'}) must be of type str or bytes, not
<class 'dict'>
To avoid such issues, I pop the Session related request parameters from the
extra_options and assign them to the instantiated requests Session before
merging those as HTTP headers. That way the HttpHook gets correctly
instantiated but also further on the HttpOperator will automatically have a
configured session without the need to specify those parameters in time when
using it in a DAG which is a cleaner approach. That way proxies and other
requests Session related config parameters are defined once in the Airflow
connections, if we need to adapt a parameter for a particular connection we
only need to do this once instead of changing each operator definition within
the DAG.
You could argue that the same can be achieved through environment variables,
but the issue is here that we have different proxy definitions depending on the
endpoint we want to invoke as those have different permissions. We have also
implemented a custom Airflow Secrets provider base on Keeper in which we define
all our connections. That way it is easy to manage all connections and have
them securely stored as those contain credentials at one place. I'll probably
also start a discussion concerning this Keeper secret provider once the last
finishes touches are done.
I've already created a branch and coded a solution for this with the
appropriate unit tests which tests each scenario.
### What you think should happen instead
The extra_options should not only contains parameters for HTTP headers in
case of a HTTP connection, it could be useful if we could also define request
related config parameters.
### How to reproduce
Just define a proxies with a dict in the extra_options fields of a HTTP
connection and the SimpleHttpOperator will fail as it will try to instantiate
an HttpHook with a HTTP header named proxies and a dict as value which will
raise an InvalidHeader exception.
### Anything else
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]