[
https://issues.apache.org/jira/browse/AIRFLOW-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833032#comment-16833032
]
Jarek Potiuk commented on AIRFLOW-2910:
---------------------------------------
I think we should improve it and possibly in non-backwards compatible way in
Airflow 2.0, to bring it back to more reasonable approach where we could
explicitly specify https:// rather than having the URL-escaped URI. This one
with http://https%3a... raises what's left of the remaining hair on top of my
head.
Is there anything preventing us from adding https:// to the list of allowed
schemas ?
Also - maybe it's a bit deeper subject and we should consider more serious
change for Airflow 2.0:
[~kamil.bregula] already introduced parsing of the parse_netloc_to_hostname in
AIRFLOW-3615 to solve lowercasing of the host but maybe we should extend i it
to parse it even better.
I believe it might relate quite a bit to the way how Connection.get_hook()
method uses schema to derive hook type from connection. I never really
understood the purpose of it (maybe I miss something) it is kind of weird to
have it in one place for all connection schemas as it's only implemented for
some types and not for all of them and it is (I believe) rarely used as most
operators will instantiate the exact hook they need.
Similarly extra parameters in UI are defined in connection_form.js for just
several types. Maybe we can sort it out for 2.0 once and for all and define a
"discoverable" connection type object that will provide information about the
type of connection, extra fields, schema(s) supported and can be dynamically
loaded in the UI and in the python code rather than hard-coded in
connection_form.js?
What I think might make sense is if we could have separate connection "types"
("GCP connection", "AWS connection", "Qbole connection", "Postgres connection",
"Mysql connection" etc.)- and they should be selectable from the UI rather than
schemas as it is now. I think there are not always 1-1 mappings between then
connection type and schema - for example the http:// or https:// can likely be
used in a number of connection types. We could likely make such change fairly
easily are there are still not that many "special" connection types so far. But
this would open the possibility of having nicer way of entering DB connections
(postgres/mysql extra parameters might be fairly complex)
What do you think ?
> models.Connection cannot use https
> ----------------------------------
>
> Key: AIRFLOW-2910
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2910
> Project: Apache Airflow
> Issue Type: Bug
> Reporter: isaac martin
> Priority: Major
>
> The SimpleHttpOperator, and anything else relying on
> airlfow.models.Connection, cannot make use of https due to what appears to be
> a bug in the way it parses user-provided urls. The bug ends up replacing any
> https uri with an http uri.
> To reproduce:
> * Create a new airflow implementation.
> * Set a connection environment var:
> AIRFLOW_CONN_ETL_API=[https://yourdomain.com|https://yourdomain.com/]
> * Instantiate a SimpleHttpOperator which uses the above for its http_conn_id
> argument.
> * Notice with horror that your requests are made to http://yourdomain.com
> To fix:
> Proposal 1
> Line 590 of airflow.models.py assigns nothing to Connection.schema.
> Change:
> self.schema = temp_uri.path[1:]
> to
> self.schema = temp_uri[0]
>
> Proposal 2:
> Line 40 or airflow.hooks.http_hook.py starts a block which tries to set the
> base_url. We could add a new elif which checks self.conn_type, as
> self.conn_type is correctly populated with 'https'.
> For example:
> elif conn.conn_type:
> self.base_url = conn.conn_type + "://" + conn.host
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)