JorgenG commented on issue #27345:
URL: https://github.com/apache/airflow/issues/27345#issuecomment-1295204855
Hey @potiuk, thanks a lot for responding. I realise my vocabulary for
Airflow logs might be a bit inaccurate.
I'll try to be more specific about what I observe. First of all, I am not
100% what you mean about about parsing of top-level code in the DAG.
Essentially what we observe is that log lines _inside a single task
execution_ appears to be duplicated.
Below is an excerpt from a task that spins up an ECS container.
```
*** Reading remote log from Cloudwatch log_group:
/stacks/intranet/airflow-tasks log_stream:
dag_id=dbt-events-incremental/run_id=scheduled__2022-10-28T15_15_00+00_00/task_id=dbt_events_incremental/attempt=1.log.
[2022-10-28, 15:21:36 UTC] Dependencies all met for <TaskInstance:
dbt-events-incremental.dbt_events_incremental
scheduled__2022-10-28T15:15:00+00:00 [queued]>
[2022-10-28, 15:21:36 UTC] Dependencies all met for <TaskInstance:
dbt-events-incremental.dbt_events_incremental
scheduled__2022-10-28T15:15:00+00:00 [queued]>
[2022-10-28, 15:21:36 UTC] Dependencies all met for <TaskInstance:
dbt-events-incremental.dbt_events_incremental
scheduled__2022-10-28T15:15:00+00:00 [queued]>
[2022-10-28, 15:21:36 UTC] Dependencies all met for <TaskInstance:
dbt-events-incremental.dbt_events_incremental
scheduled__2022-10-28T15:15:00+00:00 [queued]>
[2022-10-28, 15:21:36 UTC]
--------------------------------------------------------------------------------
[2022-10-28, 15:21:36 UTC]
--------------------------------------------------------------------------------
[2022-10-28, 15:21:36 UTC] Starting attempt 1 of 1
[2022-10-28, 15:21:36 UTC] Starting attempt 1 of 1
[2022-10-28, 15:21:36 UTC]
--------------------------------------------------------------------------------
[2022-10-28, 15:21:36 UTC]
--------------------------------------------------------------------------------
[2022-10-28, 15:21:36 UTC] Executing <Task(DbtOperatorV2):
dbt_events_incremental> on 2022-10-28 15:15:00+00:00
[2022-10-28, 15:21:36 UTC] Executing <Task(DbtOperatorV2):
dbt_events_incremental> on 2022-10-28 15:15:00+00:00
...
[2022-10-28, 15:21:37 UTC] Starting ECS Task Log Fetcher
[2022-10-28, 15:21:37 UTC] Starting ECS Task Log Fetcher
[2022-10-28, 15:21:37 UTC]
/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/operators/ecs.py:437:
DeprecationWarning: This class is deprecated. Please use
`airflow.providers.amazon.aws.hooks.ecs.EcsTaskLogFetcher`.
self.task_log_fetcher = self._get_task_log_fetcher()
[2022-10-28, 15:22:07 UTC] Using connection ID 'aws_default' for task
execution.
[2022-10-28, 15:22:38 UTC] [2022-10-28, 15:22:15 UTC] + set -o pipefail
[2022-10-28, 15:22:38 UTC] [2022-10-28, 15:22:15 UTC] + set -o pipefail
...
[2022-10-28, 15:54:54 UTC] ECS Task has been successfully executed
[2022-10-28, 15:54:54 UTC] ECS Task has been successfully executed
[2022-10-28, 15:54:54 UTC] Marking task as SUCCESS.
dag_id=dbt-events-incremental, task_id=dbt_events_incremental,
execution_date=20221028T151500, start_date=20221028T152136,
end_date=20221028T155454
[2022-10-28, 15:54:54 UTC] Marking task as SUCCESS.
dag_id=dbt-events-incremental, task_id=dbt_events_incremental,
execution_date=20221028T151500, start_date=20221028T152136,
end_date=20221028T155454
[2022-10-28, 15:54:54 UTC] Task exited with return code 0
[2022-10-28, 15:54:54 UTC] 0 downstream tasks scheduled from follow-on
schedule check
```
Upon looking through this in detail, I actually find, as you say, that there
are lines that are not duplicated.
I.e.
```
[2022-10-28, 15:21:37 UTC]
/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/operators/ecs.py:437:
DeprecationWarning: This class is deprecated. Please use
`airflow.providers.amazon.aws.hooks.ecs.EcsTaskLogFetcher`.
self.task_log_fetcher = self._get_task_log_fetcher()
[2022-10-28, 15:22:07 UTC] Using connection ID 'aws_default' for task
execution.
...
[2022-10-28, 15:54:54 UTC] Task exited with return code 0
[2022-10-28, 15:54:54 UTC] 0 downstream tasks scheduled from follow-on
schedule check
```
So based on your comment, I assume these are probably lines from DAG parsing
then?
I realise now I was wrong stating that all log lines were duplicated, I
didn't pay enough attention when looking at it. (Thanks for the hint 🙂 )
To also further elaborate, I don't see duplication of logs in i.e. the
webserver or scheduled cloudwatch outputs.
This is in no way critical for us, but I would love to help out further to
get it fixed. Is there any way we can override some configuration or similar to
verify that the change you referenced might be the cause? I.e. override
propagation or similar?
Best, Jørgen
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]