[ 
https://issues.apache.org/jira/browse/AIRFLOW-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated AIRFLOW-855:
-----------------------------
    Description: 
Impact: Anyone able to modify the application's underlying database, or a 
computer where certain DAG tasks are executed, may execute arbitrary code on 
the Airflow host.
Location: The XCom class in /airflow-internal-master/airflow/models.py
Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to 
allow for a database agnostic, object-oriented manipulation of application 
data. You express database tables and values using Python (in this 
application's use) classes, and the ORM transparently manipulates the 
underlying database, when you programatically access these structures.
Airflow defines the following class, defining an XCom's11 ORM model:
{code:title=Bar.python|borderStyle=solid}
class XCom(Base): 
  """
  Base class for XCom objects. 
  """
  __tablename__ = "xcom"
  id = Column(Integer, primary_key=True) 
  key = Column(String(512))
  value = Column(PickleType(pickler=dill)) 
  timestamp = Column(
    DateTime, default=func.now(), nullable=False) 
  execution_date = Column(DateTime, nullable=False)
{code}
XComs are used for inter-task communication, and their values are either 
defined in a DAG, or the return value of the python_callable() function or the 
task's execute() method, executed on an remote host. XCom values are, according 
to this model, of the PickleType, meaning that objects assigned to the value 
column are transparently serialized (when being written to) and deserialized 
(when being read from). The deserialization of user- controlled pickle objects 
allows for the execution of arbitrary code. This means that "slaves" (where DAG 
code is executed) can compromise "masters" (where DAGs are defined in code) by 
returning an object that, when serialized (and subsequently deserialized), 
causes remote code execution. This can also be triggered by anyone who has 
write access to this portion of the database.
Note: NCC Group plans to meet with developers in the coming days to discuss 
this finding, and it will be updated to reflect any additional insight provided 
by this meeting.
Reproduction Steps:
1. Configure a local instance of Airflow.
2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
This example models a slave returning a malicious object to a task's 
python_callable by creating a portable object (with reduce) containing a 
reverse shell and pushing it as an XCom's value. This value is serialized upon 
xcom_push and deserialized upon xcom_pull.
In an actual exploit scenario, this value would be DAG function's return value, 
as assigned by code within the function, executing on a malicious remote 
machine.
3. Start a netcat listener on your machine's port 4444
4. Execute this task from the command line with airflow run push 2016-11-17. 
Note that your netcat listener has received a shell connect-back.
Remediation: Consider the use of a custom SQLAlchemy data type that performs 
this transparent serialization and deserialization, but with JSON (a text-based 
exchange format), rather than pickles (which may contain code).

  was:
Impact: Anyone able to modify the application's underlying database, or a 
computer where certain DAG tasks are executed, may execute arbitrary code on 
the Airflow host.
Location: The XCom class in /airflow-internal-master/airflow/models.py
Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to 
allow for a database agnostic, object-oriented manipulation of application 
data. You express database tables and values using Python (in this 
application's use) classes, and the ORM transparently manipulates the 
underlying database, when you programatically access these structures.
Airflow defines the following class, defining an XCom's11 ORM model:
```
class XCom(Base): 
  """
  Base class for XCom objects. 
  """
  __tablename__ = "xcom"
  id = Column(Integer, primary_key=True) 
  key = Column(String(512))
  value = Column(PickleType(pickler=dill)) 
  timestamp = Column(
    DateTime, default=func.now(), nullable=False) 
  execution_date = Column(DateTime, nullable=False)
```
XComs are used for inter-task communication, and their values are either 
defined in a DAG, or the return value of the python_callable() function or the 
task's execute() method, executed on an remote host. XCom values are, according 
to this model, of the PickleType, meaning that objects assigned to the value 
column are transparently serialized (when being written to) and deserialized 
(when being read from). The deserialization of user- controlled pickle objects 
allows for the execution of arbitrary code. This means that "slaves" (where DAG 
code is executed) can compromise "masters" (where DAGs are defined in code) by 
returning an object that, when serialized (and subsequently deserialized), 
causes remote code execution. This can also be triggered by anyone who has 
write access to this portion of the database.
Note: NCC Group plans to meet with developers in the coming days to discuss 
this finding, and it will be updated to reflect any additional insight provided 
by this meeting.
Reproduction Steps:
1. Configure a local instance of Airflow.
2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
This example models a slave returning a malicious object to a task's 
python_callable by creating a portable object (with reduce) containing a 
reverse shell and pushing it as an XCom's value. This value is serialized upon 
xcom_push and deserialized upon xcom_pull.
In an actual exploit scenario, this value would be DAG function's return value, 
as assigned by code within the function, executing on a malicious remote 
machine.
3. Start a netcat listener on your machine's port 4444
4. Execute this task from the command line with airflow run push 2016-11-17. 
Note that your netcat listener has received a shell connect-back.
Remediation: Consider the use of a custom SQLAlchemy data type that performs 
this transparent serialization and deserialization, but with JSON (a text-based 
exchange format), rather than pickles (which may contain code).


> Security - Airflow SQLAlchemy PickleType Allows for Code Execution
> ------------------------------------------------------------------
>
>                 Key: AIRFLOW-855
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-855
>             Project: Apache Airflow
>          Issue Type: Bug
>            Reporter: Rui Wang
>         Attachments: test_dag.txt
>
>
> Impact: Anyone able to modify the application's underlying database, or a 
> computer where certain DAG tasks are executed, may execute arbitrary code on 
> the Airflow host.
> Location: The XCom class in /airflow-internal-master/airflow/models.py
> Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to 
> allow for a database agnostic, object-oriented manipulation of application 
> data. You express database tables and values using Python (in this 
> application's use) classes, and the ORM transparently manipulates the 
> underlying database, when you programatically access these structures.
> Airflow defines the following class, defining an XCom's11 ORM model:
> {code:title=Bar.python|borderStyle=solid}
> class XCom(Base): 
>   """
>   Base class for XCom objects. 
>   """
>   __tablename__ = "xcom"
>   id = Column(Integer, primary_key=True) 
>   key = Column(String(512))
>   value = Column(PickleType(pickler=dill)) 
>   timestamp = Column(
>     DateTime, default=func.now(), nullable=False) 
>   execution_date = Column(DateTime, nullable=False)
> {code}
> XComs are used for inter-task communication, and their values are either 
> defined in a DAG, or the return value of the python_callable() function or 
> the task's execute() method, executed on an remote host. XCom values are, 
> according to this model, of the PickleType, meaning that objects assigned to 
> the value column are transparently serialized (when being written to) and 
> deserialized (when being read from). The deserialization of user- controlled 
> pickle objects allows for the execution of arbitrary code. This means that 
> "slaves" (where DAG code is executed) can compromise "masters" (where DAGs 
> are defined in code) by returning an object that, when serialized (and 
> subsequently deserialized), causes remote code execution. This can also be 
> triggered by anyone who has write access to this portion of the database.
> Note: NCC Group plans to meet with developers in the coming days to discuss 
> this finding, and it will be updated to reflect any additional insight 
> provided by this meeting.
> Reproduction Steps:
> 1. Configure a local instance of Airflow.
> 2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
> This example models a slave returning a malicious object to a task's 
> python_callable by creating a portable object (with reduce) containing a 
> reverse shell and pushing it as an XCom's value. This value is serialized 
> upon xcom_push and deserialized upon xcom_pull.
> In an actual exploit scenario, this value would be DAG function's return 
> value, as assigned by code within the function, executing on a malicious 
> remote machine.
> 3. Start a netcat listener on your machine's port 4444
> 4. Execute this task from the command line with airflow run push 2016-11-17. 
> Note that your netcat listener has received a shell connect-back.
> Remediation: Consider the use of a custom SQLAlchemy data type that performs 
> this transparent serialization and deserialization, but with JSON (a 
> text-based exchange format), rather than pickles (which may contain code).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to