stromal opened a new issue, #27131:
URL: https://github.com/apache/airflow/issues/27131
### Apache Airflow version
2.4.1
### What happened
# Situation
- I have a DAG file that is scheduled to rune at 1:30 am UTC
- but in the logs it shows that it have started running at 2022-10-19,
08:09:12 UTC
- It have run for about the same period as before and delivered the data
correctly from point A to B
- I haven't modified this DAG file for days
- there is nothing else scheduled for this time period at all
- If I look at the previouse logs sometimes it star running at 1:30 am UTC
sometimes at totally different random times
- (these are Saturday and Sunday runs,
- I am the only one to acces to airflow and the files,
- noone else modified them,
- and I defiantly not touched or triggered them on Saturday and Sunday)
# System
- ubuntu 20.04 LTS
- Docker Airflow
- AWS EC2
- x86 AMD / x86 Intel CPU
# Dockerfile
```
FROM apache/airflow:2.4.1-python3.8
# ExternalPythonOperator
# Compulsory to switch parameter to use your own venve via
ExternalPythonOperator
ENV PIP_USER=false
#python venv setup that you have created locally
RUN python3 -m venv /opt/airflow/my_virtual_env
# Install dependencies:
COPY airflow/requirements.txt .
# copy over folder that containes a my own local pip package
COPY airflow/my_own_pip_package_folder /opt/airflow/my_own_pip_package_folder
# copy over ahtever file
COPY my_file.py /opt/airflow/my_file.py
# install packages in my own docker virtual environemnt
RUN /opt/airflow/my_virtual_env/bin/pip install -r requirements.txt
RUN /opt/airflow/my_virtual_env/bin/pip install
/opt/airflow/my_own_pip_package_folder
RUN /opt/airflow/my_virtual_env/bin/pip install 'apache-airflow==2.4.1'
--constraint
"https://raw.githubusercontent.com/apache/airflow/constraints-2.4.1/constraints-3.8.txt"
```
# My DAG File
```
"""
Example DAG demonstrating the usage of the TaskFlow API to execute Python
functions natively and within a
virtual environment.
"""
from __future__ import annotations
import logging
import os
import shutil
import sys
import tempfile
import time
from pprint import pprint
import pendulum
from airflow import DAG
from airflow.decorators import task
log = logging.getLogger(__name__)
PYTHON = sys.executable
BASE_DIR = tempfile.gettempdir()
with DAG(
dag_id='my_dag',
schedule='30 1 * * *', # IT IS IN UTC. EX.: 11:12 UTC = 12:12am BST =
12:12am UK time
start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
catchup=False,
tags=['Redshift'],
) as dag:
@task.external_python(task_id="my_dag_task",
python='/opt/airflow/my_virtual_env/bin/python3')
def go():
"""
Example function that will be performed in a virtual environment.
Importing at the module level ensures that it will not attempt to
import the
library before it is installed.
"""
# my imports
# my code
external_python_task = go()
```
# Logs
## Incorrect time run log, but rune without failure (Wednesday early morning)
```
4d37309d53a3
*** Reading local file:
/opt/airflow/logs/dag_id=my_dag/run_id=scheduled__2022-10-18T01:30:00+00:00/task_id=my_dag_task/attempt=1.log
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1165} INFO - Dependencies all
met for <TaskInstance: my_dag.my_dag_task scheduled__2022-10-18T01:30:00+00:00
[queued]>
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1165} INFO - Dependencies all
met for <TaskInstance: my_dag.my_dag_task scheduled__2022-10-18T01:30:00+00:00
[queued]>
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1362} INFO -
--------------------------------------------------------------------------------
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1363} INFO - Starting attempt 1
of 1
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1364} INFO -
--------------------------------------------------------------------------------
[2022-10-19, 08:09:12 UTC] {taskinstance.py:1383} INFO - Executing
<Task(_PythonExternalDecoratedOperator): my_dag_task> on 2022-10-18
01:30:00+00:00
[2022-10-19, 08:09:12 UTC] {standard_task_runner.py:54} INFO - Started
process 58 to run task
[2022-10-19, 08:09:12 UTC] {standard_task_runner.py:82} INFO - Running:
['airflow', 'tasks', 'run', 'my_dag', 'my_dag_task',
'scheduled__2022-10-18T01:30:00+00:00', '--job-id', '235', '--raw', '--subdir',
'DAGS_FOLDER/my_dag.py', '--cfg-path', '/tmp/tmpps6y22oi']
[2022-10-19, 08:09:12 UTC] {standard_task_runner.py:83} INFO - Job 235:
Subtask my_dag_task
[2022-10-19, 08:09:12 UTC] {dagbag.py:525} INFO - Filling up the DagBag from
/opt/airflow/dags/my_dag.py
[2022-10-19, 08:09:12 UTC] {task_command.py:384} INFO - Running
<TaskInstance: my_dag.my_dag_task scheduled__2022-10-18T01:30:00+00:00
[running]> on host 4d37309d53a3
[2022-10-19, 08:09:13 UTC] {taskinstance.py:1590} INFO - Exporting the
following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=my_dag
AIRFLOW_CTX_TASK_ID=my_dag_task
AIRFLOW_CTX_EXECUTION_DATE=2022-10-18T01:30:00+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-10-18T01:30:00+00:00
[2022-10-19, 08:09:16 UTC] {process_utils.py:179} INFO - Executing cmd:
/opt/airflow/my_virtual_env/bin/python3 /tmp/tmdkd47ow0r/script.py
/tmp/tmdkd47ow0r/script.in /tmp/tmdkd47ow0r/script.out
/tmp/tmdkd47ow0r/string_args.txt
[2022-10-19, 08:09:16 UTC] {process_utils.py:183} INFO - Output:
[2022-10-19, 08:09:18 UTC] {process_utils.py:187} INFO -
WARNING:root:/opt/airflow/logs/scheduler/latest already exists as a dir/file.
Skip creating symlink.
[2022-10-19, 08:09:22 UTC] {process_utils.py:187} INFO - startdate:
2022-10-18T00:00:00.000 enddate: 2022-10-18T23:59:59.000
[2022-10-19, 08:09:22 UTC] {process_utils.py:187} INFO - [[34m2022-10-19
08:09:22,282[0m] {[34mcredentials.py:[0m1108} INFO[0m - Found credentials
from IAM Role: BI-Read-Creds-2[0m
[2022-10-19, 08:09:30 UTC] {python.py:177} INFO - Done. Returned value was:
[True, 'schema.table_sql']
[2022-10-19, 08:09:30 UTC] {taskinstance.py:1401} INFO - Marking task as
SUCCESS. dag_id=my_dag, task_id=my_dag_task, execution_date=20221018T013000,
start_date=20221019T080912, end_date=20221019T080930
[2022-10-19, 08:09:31 UTC] {local_task_job.py:164} INFO - Task exited with
return code 0
[2022-10-19, 08:09:31 UTC] {local_task_job.py:273} INFO - 0 downstream tasks
scheduled from follow-on schedule check
```
## Correct run log, but rune without failure (Tuesday evening)
```
9adeff0b1844
*** Reading local file:
/opt/airflow/logs/dag_id=my_dag/run_id=scheduled__2022-10-17T01:30:00+00:00/task_id=my_dag_task/attempt=1.log
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1165} INFO - Dependencies all
met for <TaskInstance: my_dag.my_dag_task scheduled__2022-10-17T01:30:00+00:00
[queued]>
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1165} INFO - Dependencies all
met for <TaskInstance: my_dag.my_dag_task scheduled__2022-10-17T01:30:00+00:00
[queued]>
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1362} INFO -
--------------------------------------------------------------------------------
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1363} INFO - Starting attempt 1
of 1
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1364} INFO -
--------------------------------------------------------------------------------
[2022-10-18, 01:30:00 UTC] {taskinstance.py:1383} INFO - Executing
<Task(_PythonExternalDecoratedOperator): my_dag_task> on 2022-10-17
01:30:00+00:00
[2022-10-18, 01:30:00 UTC] {standard_task_runner.py:54} INFO - Started
process 23538 to run task
[2022-10-18, 01:30:00 UTC] {standard_task_runner.py:82} INFO - Running:
['airflow', 'tasks', 'run', 'my_dag', 'my_dag_task',
'scheduled__2022-10-17T01:30:00+00:00', '--job-id', '212', '--raw', '--subdir',
'DAGS_FOLDER/my_dag.py', '--cfg-path', '/tmp/tmp4aa6swro']
[2022-10-18, 01:30:00 UTC] {standard_task_runner.py:83} INFO - Job 212:
Subtask my_dag_task
[2022-10-18, 01:30:00 UTC] {dagbag.py:525} INFO - Filling up the DagBag from
/opt/airflow/dags/my_dag.py
[2022-10-18, 01:30:01 UTC] {task_command.py:384} INFO - Running
<TaskInstance: my_dag.my_dag_task scheduled__2022-10-17T01:30:00+00:00
[running]> on host 9adeff0b1844
[2022-10-18, 01:30:01 UTC] {taskinstance.py:1590} INFO - Exporting the
following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=my_dag
AIRFLOW_CTX_TASK_ID=my_dag_task
AIRFLOW_CTX_EXECUTION_DATE=2022-10-17T01:30:00+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-10-17T01:30:00+00:00
[2022-10-18, 01:30:02 UTC] {process_utils.py:179} INFO - Executing cmd:
/opt/airflow/my_virtual_env/bin/python3 /tmp/tmdebmcqmsm/script.py
/tmp/tmdebmcqmsm/script.in /tmp/tmdebmcqmsm/script.out
/tmp/tmdebmcqmsm/string_args.txt
[2022-10-18, 01:30:02 UTC] {process_utils.py:183} INFO - Output:
[2022-10-18, 01:30:03 UTC] {process_utils.py:187} INFO -
WARNING:root:/opt/airflow/logs/scheduler/latest already exists as a dir/file.
Skip creating symlink.
[2022-10-18, 01:30:06 UTC] {process_utils.py:187} INFO - startdate:
2022-10-17T00:00:00.000 enddate: 2022-10-17T23:59:59.000
[2022-10-18, 01:30:06 UTC] {process_utils.py:187} INFO - [[34m2022-10-18
01:30:06,223[0m] {[34mcredentials.py:[0m1108} INFO[0m - Found credentials
from ......[0m
[2022-10-18, 01:30:17 UTC] {python.py:177} INFO - Done. Returned value was:
[True, 'schema.table_sql']
[2022-10-18, 01:30:17 UTC] {taskinstance.py:1401} INFO - Marking task as
SUCCESS. dag_id=my_dag, task_id=my_dag_task, execution_date=20221017T013000,
start_date=20221018T013000, end_date=20221018T013017
[2022-10-18, 01:30:17 UTC] {local_task_job.py:164} INFO - Task exited with
return code 0
[2022-10-18, 01:30:17 UTC] {local_task_job.py:273} INFO - 0 downstream tasks
scheduled from follow-on schedule check
```
### What you think should happen instead
Start the file run at the specific period even if it will fail
### How to reproduce
_No response_
### Operating System
Ubuntu 20.04 LTS
### Versions of Apache Airflow Providers
IMAGE extended: ```apache/airflow:2.4.1-python3.8````
### Deployment
Other Docker-based deployment
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]