EricGao888 opened a new issue #21127:
URL: https://github.com/apache/airflow/issues/21127


   ### Apache Airflow version
   
   main (development)
   
   ### What happened
   
   If there are Chinese characters in dag_id of a dag, downloading logs of 
tasks which belong to the dag leads to 'Internal Server Error Page'
   
![image](https://user-images.githubusercontent.com/34905992/151167538-59898b5c-8978-4b76-b732-bdfeff2afba8.png)
   
![image](https://user-images.githubusercontent.com/34905992/151167566-bb3627db-20fc-4614-a4fb-02b8ba8607c4.png)
   
   
   
   ### What you expected to happen
   
   Here's the webserver log related to the bug which standalone mode produced:
   
   webserver | [2022-01-26 18:29:15 +0800] [48511] [ERROR] Error handling 
request 
/get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1
    webserver | Traceback (most recent call last):
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py",
 line 136, in handle
    webserver | self.handle_request(listener, req, client, addr)
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py",
 line 185, in handle_request
    webserver | resp.write(item)
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py",
 line 327, in write
    webserver | self.send_headers()
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py",
 line 322, in send_headers
    webserver | util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py", 
line 565, in to_bytestring
    webserver | return value.encode(encoding)
    webserver | UnicodeEncodeError: 'latin-1' codec can't encode characters in 
position 161-162: ordinal not in range(256)
    webserver | 127.0.0.1 - - [26/Jan/2022:18:29:15 +0800] "GET 
/get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1
 HTTP/1.1" 500 0 "-" "-"
    webserver | [2022-01-26 18:29:21 +0800] [48508] [ERROR] Error handling 
request 
/get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1
    webserver | Traceback (most recent call last):
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py",
 line 136, in handle
    webserver | self.handle_request(listener, req, client, addr)
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/workers/sync.py",
 line 185, in handle_request
    webserver | resp.write(item)
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py",
 line 327, in write
    webserver | self.send_headers()
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py",
 line 322, in send_headers
    webserver | util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
    webserver | File 
"/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/util.py", 
line 565, in to_bytestring
    webserver | return value.encode(encoding)
    webserver | UnicodeEncodeError: 'latin-1' codec can't encode characters in 
position 161-162: ordinal not in range(256)
    webserver | 127.0.0.1 - - [26/Jan/2022:18:29:21 +0800] "GET 
/get_logs_with_metadata?dag_id=%E6%B5%8B%E8%AF%95&task_id=sleep&execution_date=2022-01-25T09%3A23%3A42.145023%2B00%3A00&metadata=null&format=file&try_number=1
 HTTP/1.1" 500 0 "-" "-"
    triggerer | [2022-01-26 18:29:43,927] {triggerer_job.py:250} INFO - 0 
triggers currently running
   
   
   ### How to reproduce
   
   * I've tested in airflow v2.2.0 with celery executor, airflow dev version 
with standalone mode and airflow v1.10.12 with celery executor. The bug existed 
in all three version I've tested. 
   * To reproduce, simply create a dag with some Chinese characters like '测试' 
as dag_id. After triggering the dag, try to download a log file of any task of 
the dag through tree view page or graph view page and you will get redirected 
to some 'Internal Server Error Page'.
   
   ### Operating System
   
   macOS Catalina, CentOS 7
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   * Following the error log produced by websever, I checked 
`/opt/anaconda3/envs/airflow_dev/lib/python3.8/site-packages/gunicorn/http/wsgi.py`
 line 322 and saw `util.write(self.sock, util.to_bytestring(header_str, 
"latin-1"))`
   * After changing `latin-1` to `utf-8`, the bug got fixed. The whole function 
is shown as following, the commented line is added by me.
   * ```      
           def send_headers(self):
           if self.headers_sent:
               return
           tosend = self.default_headers()
           tosend.extend(["%s: %s\r\n" % (k, v) for k, v in self.headers])
   
           header_str = "%s\r\n" % "".join(tosend)
           util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
           # util.write(self.sock, util.to_bytestring(header_str, "utf-8"))
           self.headers_sent = True```
   * However, `gunicorn/http/wsgi.py` is not part of airflow code, I haven't 
figured out how to fix this without changing this script. May I ask if there is 
a better way to fix it?
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to