[ https://issues.apache.org/jira/browse/AIRFLOW-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16731920#comment-16731920 ]
Jarek Potiuk edited comment on AIRFLOW-3615 at 1/2/19 10:34 AM: ---------------------------------------------------------------- In Django, this problem is solved somewhat hackishly: [https://github.com/kennethreitz/dj-database-url/blob/master/dj_database_url.py] {{ # Handle postgres percent-encoded paths.}} {{ hostname = url.hostname or ''}} {{ if '%2f' in hostname.lower():}} {{ # Switch to url.netloc to avoid lower cased paths}} {{ hostname = url.netloc}} {{ if "@" in hostname:}} {{ hostname = hostname.rsplit("@", 1)[1]}} {{ if ":" in hostname:}} {{ hostname = hostname.split(":", 1)[0]}} {{ hostname = hostname.replace('%2f', '/').replace('%2F', '/')}} was (Author: higrys): In Django, this problem is solved somewhat hackishly: [https://github.com/kennethreitz/dj-database-url/blob/master/dj_database_url.py] {{ # Handle postgres percent-encoded paths.}} {{ hostname = url.hostname or ''}} {{ if '%2f' in hostname.lower():}} {{ # Switch to url.netloc to avoid lower cased paths}} {{ hostname = url.netloc}} {{ if "@" in hostname:}} {{ hostname = hostname.rsplit("@", 1)[1]}} {{ if ":" in hostname:}} {{ hostname = hostname.split(":", 1)[0]}} {{ hostname = hostname.replace('%2f', '/').replace('%2F', '/')}} > Connection parsed from URI - case-insensitive UNIX socket paths in python 2.7 > -> 3.5 (but not in 3.6) > ------------------------------------------------------------------------------------------------------ > > Key: AIRFLOW-3615 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3615 > Project: Apache Airflow > Issue Type: Bug > Reporter: Jarek Potiuk > Priority: Major > > There is a problem with case sensitivity of parsing URI for database > connections which are using local UNIX sockets rather than TCP connection. > In case of local UNIX sockets the hostname part of the URI contains > url-encoded local socket path rather than actual hostname and in case this > path contains uppercase characters, urlparse will deliberately lowercase them > when parsing. This is perfectly fine for hostnames (according to > [https://tools.ietf.org/html/rfc3986#section-6.2.3)] case normalisation > should be done for hostnames. > However urlparse still uses hostname if the URI does not contain host but > only local path (i.e. when the location starts with %2F ("/")). What's more - > the host gets converted to lowercase for python 2.7 - 3.5. Surprisingly this > is somewhat "fixed" in 3.6 (i.e if the URL location starts with %2F, the > hostname is not normalized to lowercase any more ! - see below snippets > showing the behaviours for different python versions) . > In Airflow's Connection this problem bubbles up. Airflow uses urlparse to get > the hostname/path in models.py:parse_from_uri and in case of UNIX sockets it > is done via hostname. There is no other, reliable way when using urlparse > because the path can also contain 'authority' (user/password) and this is > urlparse's job to separate them out. The Airflow's Connection similarly does > not make a distinction of TCP vs. local socket connection and it uses host > field to store the socket path (it's case sensitive however). So you can use > UPPERCASE when you define connection in the database, but this is a problem > for parsing connections from environment variables, because we currently > cannot pass a URI where socket path contains UPPERCASE characters. > Since urlparse is really there to parse URLs and it is not good for parsing > non-URL URIs - we should likely use different parser which handles more > generic URIs - including non-lowercasing path for all versions of python. > I think we could also consider adding local path to Connection model and use > it instead of hostname to store the socket path. This approach would be the > "correct" one, but it might introduce some compatibility issues, so maybe > it's not worth, considering that host is case sensitive in Airflow. > Snippet showing urlparse behaviour in different python versions: > {quote}Python 2.7.10 (default, Aug 17 2018, 19:45:58) > [GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> from urlparse import urlparse,unquote > >>> conn = urlparse("http://AAA") > >>> conn.hostname > 'aaa' > >>> conn = urlparse("http://%2FAAA") > >>> conn.hostname > '%2faaa' > {quote} > > {quote}Python 3.5.4 (v3.5.4:3f56838976, Aug 7 2017, 12:56:33) > [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> from urlparse import urlparse,unquote > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > ImportError: No module named 'urlparse' > >>> from urllib.parse import urlparse,unquote > >>> conn = urlparse("http://AAA") > >>> conn.hostname > 'aaa' > >>> conn = urlparse("http://%2FAAA") > >>> conn.hostname > '%2faaa' > {quote} > > {quote}Python 3.6.7 (v3.6.7:6ec5cf24b7, Oct 20 2018, 03:02:14) > [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> from urllib.parse import urlparse,unquote > >>> conn = urlparse("http://AAA") > >>> conn.hostname > 'aaa' > >>> conn = urlparse("http://%2FAAA") > >>> conn.hostname > {color:#ff0000}'%2FAAA'{color} > {quote} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)