o-nikolas commented on code in PR #39650:
URL: https://github.com/apache/airflow/pull/39650#discussion_r1602123572
##########
airflow/task/task_runner/standard_task_runner.py:
##########
@@ -53,6 +55,11 @@ def start(self):
else:
self.process = self._start_by_exec()
+ if self.process:
+ log_reader = threading.Thread(target=self._read_task_utilization)
Review Comment:
`log_reader` is a strange name for this, it's not really reading any logs?
maybe `resource_monitor`?
##########
airflow/task/task_runner/standard_task_runner.py:
##########
@@ -53,6 +55,11 @@ def start(self):
else:
self.process = self._start_by_exec()
+ if self.process:
Review Comment:
Do we want to add an option to enable/disable this feature? It's adding a
new thread to every task execution which I could see some users not loving.
##########
airflow/task/task_runner/standard_task_runner.py:
##########
@@ -186,3 +193,12 @@ def get_process_pid(self) -> int:
if self.process is None:
raise RuntimeError("Process is not started yet")
return self.process.pid
+
+ def _read_task_utilization(self):
+ while True:
+ dag_id = self._task_instance.dag_id
+ task_id = self._task_instance.task_id
+ mem_usage = self.process.memory_percent()
+ cpu_usage = self.process.cpu_percent(interval=1)
Review Comment:
Interesting, `interval=1` is what rate limits this while loop from running
too frequently.
I see this in the docs:
>When interval is > 0.0 compares process times to system CPU times elapsed
before and after the interval (blocking). When interval is 0.0 or None compares
process times to system CPU times elapsed since last call, returning
immediately. That means the first time this is called it will return a
meaningless 0.0 value which you are supposed to ignore. **In this case is
recommended for accuracy that this function be called a second time with at
least 0.1 seconds between calls.**
Should we not follow that recommendation? Also is 1 second a bit too tight
of a loop? I think that's going to be quite spamy?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]