[
https://issues.apache.org/jira/browse/AIRFLOW-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ash Berlin-Taylor resolved AIRFLOW-3615.
----------------------------------------
Resolution: Fixed
Fix Version/s: 1.10.3
> Connection parsed from URI - case-insensitive UNIX socket paths in python 2.7
> -> 3.5 (but not in 3.6)
> ------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-3615
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3615
> Project: Apache Airflow
> Issue Type: Bug
> Reporter: Jarek Potiuk
> Assignee: Kamil Bregula
> Priority: Major
> Fix For: 1.10.3
>
>
> There is a problem with case sensitivity of parsing URI for database
> connections which are using local UNIX sockets rather than TCP connection.
> In case of local UNIX sockets the hostname part of the URI contains
> url-encoded local socket path rather than actual hostname and in case this
> path contains uppercase characters, urlparse will deliberately lowercase them
> when parsing. This is perfectly fine for hostnames (according to
> [https://tools.ietf.org/html/rfc3986#section-6.2.3)] case normalisation
> should be done for hostnames.
> However urlparse still uses hostname if the URI does not contain host but
> only local path (i.e. when the location starts with %2F ("/")). What's more -
> the host gets converted to lowercase for python 2.7 - 3.5. Surprisingly this
> is somewhat "fixed" in 3.6 (i.e if the URL location starts with %2F, the
> hostname is not normalized to lowercase any more ! - see below snippets
> showing the behaviours for different python versions) .
> In Airflow's Connection this problem bubbles up. Airflow uses urlparse to get
> the hostname/path in models.py:parse_from_uri and in case of UNIX sockets it
> is done via hostname. There is no other, reliable way when using urlparse
> because the path can also contain 'authority' (user/password) and this is
> urlparse's job to separate them out. The Airflow's Connection similarly does
> not make a distinction of TCP vs. local socket connection and it uses host
> field to store the socket path (it's case sensitive however). So you can use
> UPPERCASE when you define connection in the database, but this is a problem
> for parsing connections from environment variables, because we currently
> cannot pass a URI where socket path contains UPPERCASE characters.
> Since urlparse is really there to parse URLs and it is not good for parsing
> non-URL URIs - we should likely use different parser which handles more
> generic URIs - including non-lowercasing path for all versions of python.
> I think we could also consider adding local path to Connection model and use
> it instead of hostname to store the socket path. This approach would be the
> "correct" one, but it might introduce some compatibility issues, so maybe
> it's not worth, considering that host is case sensitive in Airflow.
> Snippet showing urlparse behaviour in different python versions:
> {quote}Python 2.7.10 (default, Aug 17 2018, 19:45:58)
> [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from urlparse import urlparse,unquote
> >>> conn = urlparse("http://AAA")
> >>> conn.hostname
> 'aaa'
> >>> conn = urlparse("http://%2FAAA")
> >>> conn.hostname
> '%2faaa'
> {quote}
>
> {quote}Python 3.5.4 (v3.5.4:3f56838976, Aug 7 2017, 12:56:33)
> [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from urlparse import urlparse,unquote
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ImportError: No module named 'urlparse'
> >>> from urllib.parse import urlparse,unquote
> >>> conn = urlparse("http://AAA")
> >>> conn.hostname
> 'aaa'
> >>> conn = urlparse("http://%2FAAA")
> >>> conn.hostname
> '%2faaa'
> {quote}
>
> {quote}Python 3.6.7 (v3.6.7:6ec5cf24b7, Oct 20 2018, 03:02:14)
> [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from urllib.parse import urlparse,unquote
> >>> conn = urlparse("http://AAA")
> >>> conn.hostname
> 'aaa'
> >>> conn = urlparse("http://%2FAAA")
> >>> conn.hostname
> {color:#ff0000}'%2FAAA'{color}
> {quote}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)