aaronluo created AIRFLOW-4150:
---------------------------------
Summary: Modify the docker operator implementation
Key: AIRFLOW-4150
URL: https://issues.apache.org/jira/browse/AIRFLOW-4150
Project: Apache Airflow
Issue Type: Improvement
Components: docker
Reporter: aaronluo
1. I create a test python script testpython.py for docker
{quote}import time
time.sleep(1000)
{quote}
2. I create a DAG with a task that calls the script through a docker
{quote}docker_ls = DockerOperator(
task_id='docker_ls',
image='python',
working_dir = '/data/wf/',
command='python testpython.py',
docker_url='http://192.168.1.215:2375',
start_date=datetime(2015, 6, 1),
volumes = ['/data/wf:/data/wf/'],
dag=dag
)
{quote}
3. When I run this DAG, obviously, celery worker will be working for a very
long time,
In addition, docker container will also run for a long time。
{quote}for line in self.cli.logs(container=self.container['Id'], stream=True):
line = line.strip()
if hasattr(line, 'decode'):
line = line.decode('utf-8')
self.log.info(line)
result = self.cli.wait(self.container['Id'])
if result['StatusCode'] != 0:
raise AirflowException('docker container failed: ' + repr(result)){quote}
My suggestion is that after submitting the task to docker, celery's
corresponding worker should end up monitoring the docker's events, rather than
blocking them all the time, because the events take a long time to execute.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)