jnunezgts opened a new issue #11617:
URL: https://github.com/apache/airflow/issues/11617
<!--
Welcome to Apache Airflow! For a smooth issue process, try to answer the
following questions.
Don't worry if they're not all applicable; just try to include what you can
:-)
If you need to include code snippets or logs, please put them in fenced code
blocks. If they're super-long, please use the details tag like
<details><summary>super-long log</summary> lots of stuff </details>
Please delete these comment blocks before submitting the issue.
-->
**Apache Airflow version**:
1.10.12, using SQLLite as the backend
**Kubernetes version (if you are using kubernetes)** (use `kubectl version`):
N/A. Using Docker Swarm 19.03.8
**Environment**:
- **Cloud provider or hardware configuration**:
No cloud, bare-metal server:
```
HP ProLiant DL560 Gen8, BIOS P77 12/20/2013, 64 cpus
```
- **OS** (e.g. from /etc/os-release):
```
Fedora release 29 (Twenty Nine)
```
- **Kernel** (e.g. `uname -a`):
```
Linux server.company.com 4.19.82-1300.fc29.x86_64 #1 SMP Fri Nov 8 10:49:58
EST 2019 x86_64 x86_64 x86_64 GNU/Linux
```
- **Install tools**:
```
pip
```
- **Others**:
```
Python 3.7.2 (default, Jan 16 2019, 19:49:22)
[GCC 8.2.1 20181215 (Red Hat 8.2.1-6)] on linux
```
Docker info:
```
Client:
Debug Mode: false
Server:
Containers: 21
Running: 0
Paused: 0
Stopped: 21
Images: 12
Server Version: 19.03.8
Storage Driver: overlay2
Backing Filesystem: <unknown>
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries
splunk syslog
Swarm: active
NodeID: j0pl320hoxuqcaaa14z2znvgo
Is Manager: true
ClusterID: kpgz783mpw8aapdxchtwdu2ff
Managers: 1
Nodes: 4
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 172.29.248.55
Manager Addresses:
172.29.248.55:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.19.82-1300.fc29.x86_64
Operating System: Fedora 29 (Twenty Nine)
OSType: linux
Architecture: x86_64
CPUs: 64
Total Memory: 125.9GiB
Name: server.company.com
ID: 7ESU:O253:JGNS:YJIY:XXX:CYTI:WFQC:6L5C:XXXX:62IO:VH23:XXXX
Docker Root Dir: /opt/docker
Debug Mode: false
HTTP Proxy: http://proxy.company.com:8080/
HTTPS Proxy: http://proxy.company:8080/
No Proxy: localhost,127.0.0.1,server.company.com,.company.com
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
privatereg.company.com:5000
localhost:5000
server.company.com:5000
127.0.0.0/8
Live Restore Enabled: false
```
**What happened**:
Created the following DAG to schedule a one time shot job:
```
from datetime import time
from datetime import datetime
from datetime import timedelta
from airflow import DAG
from airflow.contrib.operators.docker_swarm_operator import
DockerSwarmOperator
DEFAULT_ARGS = {
'retry_delay': timedelta(minutes=5),
'retries': 1,
'email_on_failure': True,
'email_on_retry': False,
'email': ['[email protected]']
}
with DAG('24_7_box', description='24 x 7. With retries',
default_args=DEFAULT_ARGS, schedule_interval='0 * * * Mon-Sun',
start_date=datetime(2019, 7, 23), max_active_runs=1, catchup=False) as
twenty_four_by_seven_dag:
# See:
#
https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/docker_swarm_operator/index.html
#
https://airflow.apache.org/docs/stable/_modules/airflow/contrib/operators/docker_swarm_operator.html
SLEEP_TASK = DockerSwarmOperator(
task_id="SLEEP_TASK",
image="fedora:29",
api_version="auto",
command="/bin/sleep 60",
docker_url="unix://var/run/docker-sysavtbuild.sock",
force_pull=False,
mem_limit="500m",
auto_remove=True,
)
SLEEP_TASK
```
**What you expected to happen**:
I was expecting the container to be created and be alive for 60 seconds,
exit with code=0 after that. No ouput.
Other have reported success in the past using [Docker Swarm
Operator](https://airflow.apache.org/docs/stable/_api/airflow/contrib/operators/docker_swarm_operator/).
<!-- What do you think went wrong? -->
Not sure. The Airflow log shows the following:
```
[2020-10-17 09:46:58,475] {taskinstance.py:1150} ERROR - 400 Client Error:
Bad Request ("json: cannot unmarshal string into Go struct field
Resources.MemoryBytes of type int64")
```
I can run this command from docker CLI as follows:
```
[user@server dags]$ docker run --rm --detach fedora:29 /bin/sleep 45
29912c34f43e2dfa20d417cb80113059a183518b99215609c0aa7b37874c27db
[user@server dags]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
29912c34f43e fedora:29 "/bin/sleep 45" 7 seconds ago
Up 6 seconds gifted_pare
```
**How to reproduce it**:
1. Copy the DAG provided into ~/airflow/dags
2. Turn off the DAG
3. Trigger the DAG or let the scheduler run it. Error will show up
**Anything else we need to know**:
<details><summary>Airflow.log</summary>
*** Reading local file:
/home/user/airflow/logs/avt_24_7_box/SLEEP_TASK/2020-10-17T13:24:26.101897+00:00/2.log
[2020-10-17 09:46:58,312] {taskinstance.py:670} INFO - Dependencies all met
for <TaskInstance: avt_24_7_box.SLEEP_TASK 2020-10-17T13:24:26.101897+00:00
[queued]>
[2020-10-17 09:46:58,321] {taskinstance.py:670} INFO - Dependencies all met
for <TaskInstance: avt_24_7_box.SLEEP_TASK 2020-10-17T13:24:26.101897+00:00
[queued]>
[2020-10-17 09:46:58,321] {taskinstance.py:880} INFO -
--------------------------------------------------------------------------------
[2020-10-17 09:46:58,321] {taskinstance.py:881} INFO - Starting attempt 2 of
2
[2020-10-17 09:46:58,321] {taskinstance.py:882} INFO -
--------------------------------------------------------------------------------
[2020-10-17 09:46:58,328] {taskinstance.py:901} INFO - Executing
<Task(DockerSwarmOperator): SLEEP_TASK> on 2020-10-17T13:24:26.101897+00:00
[2020-10-17 09:46:58,335] {standard_task_runner.py:54} INFO - Started
process 35637 to run task
[2020-10-17 09:46:58,371] {standard_task_runner.py:77} INFO - Running:
['airflow', 'run', '24_7_box', 'SLEEP_TASK',
'2020-10-17T13:24:26.101897+00:00', '--job_id', '55', '--pool', 'default_pool',
'--raw', '-sd', 'DAGS_FOLDER/avt_test3.py', '--cfg_path', '/tmp/tmpaivrdhuu']
[2020-10-17 09:46:58,372] {standard_task_runner.py:78} INFO - Job 55:
Subtask SLEEP_TASK
[2020-10-17 09:46:58,398] {logging_mixin.py:112} INFO - Running %s on host
%s <TaskInstance: avt_24_7_box.SLEEP_TASK 2020-10-17T13:24:26.101897+00:00
[running]> server.company.com
[2020-10-17 09:46:58,467] {docker_swarm_operator.py:105} INFO - Starting
docker service from image fedora:29
[2020-10-17 09:46:58,475] {taskinstance.py:1150} ERROR - 400 Client Error:
Bad Request ("json: cannot unmarshal string into Go struct field
Resources.MemoryBytes of type int64")
Traceback (most recent call last):
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/client.py",
line 259, in _raise_for_status
response.raise_for_status()
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/requests/models.py",
line 941, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url:
http+docker://localhost/v1.40/services/create
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/airflow/models/taskinstance.py",
line 984, in _run_raw_task
result = task_copy.execute(context=context)
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/airflow/operators/docker_operator.py",
line 277, in execute
return self._run_image()
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/airflow/contrib/operators/docker_swarm_operator.py",
line 119, in _run_image
labels={'name': 'airflow__%s__%s' % (self.dag_id, self.task_id)}
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/utils/decorators.py",
line 34, in wrapper
return f(self, *args, **kwargs)
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/service.py",
line 190, in create_service
self._post_json(url, data=data, headers=headers), True
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/client.py",
line 265, in _result
self._raise_for_status(response)
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/api/client.py",
line 261, in _raise_for_status
raise create_api_error_from_http_exception(e)
File
"/home/user/virtualenv/airflow/lib64/python3.7/site-packages/docker/errors.py",
line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 400 Client Error: Bad Request ("json: cannot
unmarshal string into Go struct field Resources.MemoryBytes of type int64")
[2020-10-17 09:46:58,481] {taskinstance.py:1194} INFO - Marking task as
FAILED. dag_id=24_7_box, task_id=SLEEP_TASK, execution_date=20201017T132426,
start_date=20201017T134658, end_date=20201017T134658
[2020-10-17 09:46:58,509] {configuration.py:373} WARNING - section/key
[smtp/smtp_user] not found in config
[2020-10-17 09:46:58,583] {email.py:132} INFO - Sent an alert email to
['[email protected]']
[2020-10-17 09:47:03,312] {local_task_job.py:102} INFO - Task exited with
return code 1
</details>
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]