EricGao888 opened a new issue #21127: URL: https://github.com/apache/airflow/issues/21127
### Apache Airflow version main (development) ### What happened If there are Chinese characters in dag_id of a dag, downloading logs of tasks which belong to the dag leads to 'Internal Server Error Page'   ### What you expected to happen Here's the webserver log related to the bug which standalone mode produced: webserver | [2022-01-26 18:29:15 +0800] [48511] [ERROR] Error handling request /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 webserver | Traceback (most recent call last): webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 136, in handle webserver | self.handle_request(listener, req, client, addr) webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 185, in handle_request webserver | resp.write(item) webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 327, in write webserver | self.send_headers() webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 322, in send_headers webserver | util.write(self.sock, util.to_bytestring(header_str, "latin-1")) webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py", line 565, in to_bytestring webserver | return value.encode(encoding) webserver | UnicodeEncodeError: 'latin-1' codec can't encode characters in position 161-162: ordinal not in range(256) webserver | 127.0.0.1 - - [26/Jan/2022:18:29:15 +0800] "GET /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 HTTP/1.1" 500 0 "-" "-" webserver | [2022-01-26 18:29:21 +0800] [48508] [ERROR] Error handling request /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 webserver | Traceback (most recent call last): webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 136, in handle webserver | self.handle_request(listener, req, client, addr) webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 185, in handle_request webserver | resp.write(item) webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 327, in write webserver | self.send_headers() webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 322, in send_headers webserver | util.write(self.sock, util.to_bytestring(header_str, "latin-1")) webserver | File "/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py", line 565, in to_bytestring webserver | return value.encode(encoding) webserver | UnicodeEncodeError: 'latin-1' codec can't encode characters in position 161-162: ordinal not in range(256) webserver | 127.0.0.1 - - [26/Jan/2022:18:29:21 +0800] "GET /get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1 HTTP/1.1" 500 0 "-" "-" triggerer | [2022-01-26 18:29:43,927] {triggerer_job.py:250} INFO - 0 triggers currently running ### How to reproduce * I've tested in airflow v2.2.0 with celery executor, airflow dev version with standalone mode and airflow v1.10.12 with celery executor. The bug existed in all three version I've tested. * To reproduce, simply create a dag with some Chinese characters like '测试' as dag_id. After triggering the dag, try to download a log file of any task of the dag through tree view page or graph view page and you will get redirected to some 'Internal Server Error Page'. ### Operating System macOS Catalina, CentOS 7 ### Versions of Apache Airflow Providers _No response_ ### Deployment Other ### Deployment details _No response_ ### Anything else * Following the error log produced by websever, I checked `/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py` line 322 and saw `util.write(self.sock, util.to_bytestring(header_str, "latin-1"))` * After changing `latin-1` to `utf-8`, the bug got fixed. The whole function is shown as following, the commented line is added by me. * ``` def send_headers(self): if self.headers_sent: return tosend = self.default_headers() tosend.extend(["%s: %s\r\n" % (k, v) for k, v in self.headers]) header_str = "%s\r\n" % "".join(tosend) util.write(self.sock, util.to_bytestring(header_str, "latin-1")) # util.write(self.sock, util.to_bytestring(header_str, "utf-8")) self.headers_sent = True``` * However, `gunicorn/http/wsgi.py` is not part of airflow code, I haven't figured out how to fix this without changing this script. May I ask if there is a better way to fix it? ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
