ZuhairORZaki opened a new pull request, #36469:
URL: https://github.com/apache/airflow/pull/36469

   While triaging your project, our bug fixing tool generated the following 
message-
   
   In file: `views.py`, there is a method that is vulnerable to XSS which can 
compromise any cookies, session tokens and other sensitive information used 
with the website and browser.
   
   
[views.py](https://github.com/apache/airflow/blob/main/airflow/www/views.py#L2633)
   ```python
   @expose(\"/confirm\", methods=[\"GET\"])
   @auth.has_access_dag(\"PUT\", DagAccessEntity.TASK_INSTANCE)
   @action_logging
   def confirm(self):
       \"\"\"Show confirmation page for marking tasks as success or 
failed.\"\"\"
       args = request.args
   
       ...
   
       dag_id = args.get(\"dag_id\")
   
       ...
   
       dag = get_airflow_app().dag_bag.get_dag(dag_id)
       if not dag:
           msg = f\"DAG {dag_id} not found\"
           return redirect_or_json(origin, msg, status=\"error\", 
status_code=404)
   ```
   
   Here request handler method `confirm` reads the value of request parameter 
`dag_id`. If no associated `dag` object is found then it cobstructs a string 
`msg` with it and passes it to `redirect_or_json` method which sends back a 
response containing that `msg`. Since flask automatically url decodes request 
parameters `dag_id` can be assigned javascript code which will be sent back as 
part of the response. 
   
   But the response content-type is application/json, so injected payload 
remains as a JSON data value in a JSON name/value pair and doesn't execute. So 
this isn't exactly an XSS vulnerability. Still if a client-side script receives 
this JSON value and uses some unsafe DOM update method e.g. `setInnerHTML`, 
`document.write` etc. then this could be a case of client-side XSS 
vulnerability.
   
   ### Exploitable Scenario
   
   Consider the following simple client side page.
   
   ```html
   <!DOCTYPE html>
   <html lang=\"en\">
   <head>
       <meta charset=\"UTF-8\">
       <meta name=\"viewport\" content=\"width=device-width, 
initial-scale=1.0\">
       <title>Fetch and Display Data</title>
   </head>
   <body>
       <div id=\"detail\"></div>
   
   <script>
       document.addEventListener('DOMContentLoaded', function() {
           var pos=document.URL.indexOf(\"dag_id=\")+7;
   
           var dagId = decodeURIComponent(document.URL.substring(pos));
   
           fetch(`https://example.airflow.com/confirm?dag_id=${dagId}`)
               .then(response => response.text())
               .then(data => {
                   document.getElementById(\"detail\").innerHTML = data;
   
               })
               .catch(error => console.error('Error:', error));
       });
   </script>
   
   </body>
   </html>
   ```
   
   Let's assume the page invoked with an URL such as
   ```
   http://www.some.site/confirm_page.html?dag_id=123
   ```
   The script wihtin the page reads the `dag_id` parameter value from the url 
and sends a GET request to the above server with that value. The server then 
sends a Not Found response message if no associated dag object is found with 
the `dag_id` value included. 
   
   Now an XSS attack can be carried out by sending an URL like below to a 
victim.
   ```
   
http://www.some.site/confirm_page.html?dag_id=%3Cimg%20src%3D%27x%27%20onerror%3D%27alert%281%29%27%3E
   ```
   
   The payload here is a URL-encoded form of the following:
   
   ```
   <img src='x' onerror='alert(1)'>
   ```
   
   Since the client-side code here uses an unsafe DOM update method 
`(element).innerHTML` the injected payload will execute in the victim's browser.
   
   ### Suggested Fix
   
   One possible fix is sanitizing user provided data `dag_id` by defining a 
regex pattern that represents all inappropriate characters like the following:
   
   ```python
   pattern = re.compile(r'[^a-zA-Z0-9_\-]')
   ```
   Then any character that matches the above pattern can be replaced with an 
empty string using following statement:
   
   ```python
   dag_id = re.sub(pattern, '', dag_id)
   ```
   Now if an attacker supplies javascript code as value for `dag_id` e.g. 
`<script>alert(1)</script>`, it will just become `scriptalert1script`.
   
   ### CLA Requirements:
   This section is only relevant if your project requires contributors to sign 
a Contributor License Agreement (CLA) for external contributions.
   
   All contributed commits are already automatically signed off.
   
   The meaning of a signoff depends on the project, but it typically certifies 
that committer has the rights to submit this work under the same license and 
agrees to a Developer Certificate of Origin (see 
[https://developercertificate.org/](https://developercertificate.org/) for more 
information).
   - [Git Commit SignOff documentation](https://developercertificate.org/)
   
   ### Sponsorship and Support:
   This work is done by the security researchers from OpenRefactory and is 
supported by the [Open Source Security Foundation 
(OpenSSF)](https://openssf.org/): [Project 
Alpha-Omega](https://alpha-omega.dev/). Alpha-Omega is a project partnering 
with open source software project maintainers to systematically find new, 
as-yet-undiscovered vulnerabilities in open source code - and get them fixed - 
to improve global software supply chain security.
   
   The bug is found by running the iCR tool by [OpenRefactory, 
Inc.](https://openrefactory.com/) and then manually triaging the results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to