dabla opened a new issue, #36732:
URL: https://github.com/apache/airflow/issues/36732

   ### Apache Airflow Provider(s)
   
   http
   
   ### Versions of Apache Airflow Providers
   
   2.7.3
   
   ### Apache Airflow version
   
   2.7.3
   
   ### Operating System
   
   RedHat
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   As we are using Airflow behind a corporate HTTP proxy, we need to pass 
dedicated proxy definitions for each HTTP call depending on which endpoint we 
want to invoke as those have different permissions.  For the ease of use, we 
define the proxy definition as a dict within the extra_options field of the 
HTTP Connection managed by Airflow.  Unfortunately, the HttpHook assumes that 
all values defined in the extra_options fields are HTTP headers which is not 
the case.  Following parameters are only useful for the SimpleHttpOperator 
itself but not for the HttpHook:
   
   - proxies
   - stream
   - verify
   - cert
   - max_redirects
   
   As in our case the proxies parameter contains a dict, when the HttpHook 
tries to add the proxies as a HTTP header we get an InvalidHeader exception as 
a the value must be of type string or bytes:
   
       raise InvalidHeader( requests.exceptions.InvalidHeader: Header part 
({'http': 'http://proxy', 'https': 'http://proxy'}) from ('proxies', {'http': 
'http://proxy', 'https': 'http://proxy'}) must be of type str or bytes, not 
<class 'dict'>
   
   To avoid such issues, I pop the Session related request parameters from the 
extra_options and assign them to the instantiated requests Session before 
merging those as HTTP headers. That way the HttpHook gets correctly 
instantiated but also further on the HttpOperator will automatically have a 
configured session without the need to specify those parameters in time when 
using it in a DAG which is a cleaner approach.  That way proxies and other 
requests Session related config parameters are defined once in the Airflow 
connections, if we need to adapt a parameter for a particular connection we 
only need to do this once instead of changing each operator definition within 
the DAG.
   
   You could argue that the same can be achieved through environment variables, 
but the issue is here that we have different proxy definitions depending on the 
endpoint we want to invoke as those have different permissions.  We have also 
implemented a custom Airflow Secrets provider base on Keeper in which we define 
all our connections.  That way it is easy to manage all connections and have 
them securely stored as those contain credentials at one place.  I'll probably 
also start a discussion concerning this Keeper secret provider once the last 
finishes touches are done.
   
   I've already created a branch and coded a solution for this with the 
appropriate unit tests which tests each scenario.
   
   ### What you think should happen instead
   
   The extra_options should not only contains parameters for HTTP headers in 
case of a HTTP connection, it could be useful if we could also define request 
related config parameters.
   
   ### How to reproduce
   
   Just define a proxies with a dict in the extra_options fields of a HTTP 
connection and the SimpleHttpOperator will fail as it will try to instantiate 
an HttpHook with a HTTP header named proxies and a dict as value which will 
raise an InvalidHeader exception.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to