[ https://issues.apache.org/jira/browse/AIRFLOW-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rui Wang closed AIRFLOW-855. ---------------------------- Resolution: Won't Fix > Security - Airflow SQLAlchemy PickleType Allows for Code Execution > ------------------------------------------------------------------ > > Key: AIRFLOW-855 > URL: https://issues.apache.org/jira/browse/AIRFLOW-855 > Project: Apache Airflow > Issue Type: Bug > Reporter: Rui Wang > Assignee: Rui Wang > Priority: Major > Attachments: test_dag.txt > > > Impact: Anyone able to modify the application's underlying database, or a > computer where certain DAG tasks are executed, may execute arbitrary code on > the Airflow host. > Location: The XCom class in /airflow-internal-master/airflow/models.py > Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to > allow for a database agnostic, object-oriented manipulation of application > data. You express database tables and values using Python (in this > application's use) classes, and the ORM transparently manipulates the > underlying database, when you programatically access these structures. > Airflow defines the following class, defining an XCom's11 ORM model: > {code} > class XCom(Base): > """ > Base class for XCom objects. > """ > __tablename__ = "xcom" > id = Column(Integer, primary_key=True) > key = Column(String(512)) > value = Column(PickleType(pickler=dill)) > timestamp = Column( > DateTime, default=func.now(), nullable=False) > execution_date = Column(DateTime, nullable=False) > {code} > XComs are used for inter-task communication, and their values are either > defined in a DAG, or the return value of the python_callable() function or > the task's execute() method, executed on an remote host. XCom values are, > according to this model, of the PickleType, meaning that objects assigned to > the value column are transparently serialized (when being written to) and > deserialized (when being read from). The deserialization of user- controlled > pickle objects allows for the execution of arbitrary code. This means that > "slaves" (where DAG code is executed) can compromise "masters" (where DAGs > are defined in code) by returning an object that, when serialized (and > subsequently deserialized), causes remote code execution. This can also be > triggered by anyone who has write access to this portion of the database. > Note: NCC Group plans to meet with developers in the coming days to discuss > this finding, and it will be updated to reflect any additional insight > provided by this meeting. > Reproduction Steps: > 1. Configure a local instance of Airflow. > 2. Insert the attached DAG into your AIRFLOW_HOME/dags directory. > This example models a slave returning a malicious object to a task's > python_callable by creating a portable object (with reduce) containing a > reverse shell and pushing it as an XCom's value. This value is serialized > upon xcom_push and deserialized upon xcom_pull. > In an actual exploit scenario, this value would be DAG function's return > value, as assigned by code within the function, executing on a malicious > remote machine. > 3. Start a netcat listener on your machine's port 4444 > 4. Execute this task from the command line with airflow run push 2016-11-17. > Note that your netcat listener has received a shell connect-back. > Remediation: Consider the use of a custom SQLAlchemy data type that performs > this transparent serialization and deserialization, but with JSON (a > text-based exchange format), rather than pickles (which may contain code). -- This message was sent by Atlassian JIRA (v7.6.3#76005)