[ 
https://issues.apache.org/jira/browse/AIRFLOW-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang closed AIRFLOW-855.
----------------------------
    Resolution: Won't Fix

> Security - Airflow SQLAlchemy PickleType Allows for Code Execution
> ------------------------------------------------------------------
>
>                 Key: AIRFLOW-855
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-855
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Rui Wang
>            Assignee: Rui Wang
>            Priority: Major
>         Attachments: test_dag.txt
>
>
> Impact: Anyone able to modify the application's underlying database, or a 
> computer where certain DAG tasks are executed, may execute arbitrary code on 
> the Airflow host.
> Location: The XCom class in /airflow-internal-master/airflow/models.py
> Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to 
> allow for a database agnostic, object-oriented manipulation of application 
> data. You express database tables and values using Python (in this 
> application's use) classes, and the ORM transparently manipulates the 
> underlying database, when you programatically access these structures.
> Airflow defines the following class, defining an XCom's11 ORM model:
> {code}
> class XCom(Base): 
>   """
>   Base class for XCom objects. 
>   """
>   __tablename__ = "xcom"
>   id = Column(Integer, primary_key=True) 
>   key = Column(String(512))
>   value = Column(PickleType(pickler=dill)) 
>   timestamp = Column(
>     DateTime, default=func.now(), nullable=False) 
>   execution_date = Column(DateTime, nullable=False)
> {code}
> XComs are used for inter-task communication, and their values are either 
> defined in a DAG, or the return value of the python_callable() function or 
> the task's execute() method, executed on an remote host. XCom values are, 
> according to this model, of the PickleType, meaning that objects assigned to 
> the value column are transparently serialized (when being written to) and 
> deserialized (when being read from). The deserialization of user- controlled 
> pickle objects allows for the execution of arbitrary code. This means that 
> "slaves" (where DAG code is executed) can compromise "masters" (where DAGs 
> are defined in code) by returning an object that, when serialized (and 
> subsequently deserialized), causes remote code execution. This can also be 
> triggered by anyone who has write access to this portion of the database.
> Note: NCC Group plans to meet with developers in the coming days to discuss 
> this finding, and it will be updated to reflect any additional insight 
> provided by this meeting.
> Reproduction Steps:
> 1. Configure a local instance of Airflow.
> 2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
> This example models a slave returning a malicious object to a task's 
> python_callable by creating a portable object (with reduce) containing a 
> reverse shell and pushing it as an XCom's value. This value is serialized 
> upon xcom_push and deserialized upon xcom_pull.
> In an actual exploit scenario, this value would be DAG function's return 
> value, as assigned by code within the function, executing on a malicious 
> remote machine.
> 3. Start a netcat listener on your machine's port 4444
> 4. Execute this task from the command line with airflow run push 2016-11-17. 
> Note that your netcat listener has received a shell connect-back.
> Remediation: Consider the use of a custom SQLAlchemy data type that performs 
> this transparent serialization and deserialization, but with JSON (a 
> text-based exchange format), rather than pickles (which may contain code).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to