Copilot commented on code in PR #12916:
URL: https://github.com/apache/trafficserver/pull/12916#discussion_r2875107964
##########
tests/gold_tests/thread_config/check_threads.py:
##########
@@ -20,26 +20,45 @@
import psutil
import argparse
import sys
+import time
def count_threads(ts_path, etnet_threads, accept_threads, task_threads,
aio_threads):
for p in psutil.process_iter(['name', 'cwd', 'threads']):
+ # Use cached info from process_iter attrs to avoid race conditions
+ # where the process exits between iteration and inspection.
+ try:
+ proc_name = p.info.get('name', '')
+ proc_cwd = p.info.get('cwd', '')
+ except (psutil.NoSuchProcess, psutil.AccessDenied):
+ continue
Review Comment:
The `try/except (psutil.NoSuchProcess, psutil.AccessDenied)` block wrapping
`p.info.get()` at lines 32–36 is incorrect. `p.info` is an ordinary Python
dict; calling `.get()` on it never raises psutil exceptions. When
`process_iter` can't fetch an attribute (e.g., `cwd` is denied), it stores the
exception object as the dict value rather than raising it. As a result:
1. The `except` clause never fires, giving a false sense of protection.
2. `proc_cwd` may silently be an exception object instead of a string, which
causes the `== ts_path` comparison to silently fail — harmless but misleading.
The correct approach here is to check whether the value stored in `p.info`
is an exception instance before using it, e.g., `proc_cwd = p.info.get('cwd',
'') if not isinstance(p.info.get('cwd'), Exception) else ''`. Alternatively,
for modern psutil versions (≥6), attribute errors are represented differently,
so consult the psutil docs for the version in use. The `try/except` block can
simply be removed since it provides no protection.
##########
tests/gold_tests/thread_config/check_threads.py:
##########
@@ -20,26 +20,45 @@
import psutil
import argparse
import sys
+import time
def count_threads(ts_path, etnet_threads, accept_threads, task_threads,
aio_threads):
for p in psutil.process_iter(['name', 'cwd', 'threads']):
+ # Use cached info from process_iter attrs to avoid race conditions
+ # where the process exits between iteration and inspection.
+ try:
+ proc_name = p.info.get('name', '')
+ proc_cwd = p.info.get('cwd', '')
+ except (psutil.NoSuchProcess, psutil.AccessDenied):
+ continue
+
# Find the pid corresponding to the ats process we started in autest.
# It needs to match the process name and the binary path.
# If autest can expose the pid of the process this is not needed
anymore.
- if p.name() == '[TS_MAIN]' and p.cwd() == ts_path:
+ if proc_name == '[TS_MAIN]' and proc_cwd == ts_path:
etnet_check = set()
accept_check = set()
task_check = set()
aio_check = set()
- for t in p.threads():
+ try:
+ threads = p.threads()
+ except (psutil.NoSuchProcess, psutil.AccessDenied):
+ sys.stderr.write(f'Process {p.pid} disappeared while reading
threads.\n')
+ return 1
Review Comment:
When a process disappears while reading threads (line 52), `count_threads`
returns 1, and the retry logic in `main()` will retry because `result == 1`.
However, the diagnostic message printed on line 51 only describes the
disappearance — the retry message printed on line 161 then says "process not
found, retrying in 2s..." which is misleading (the process was found but then
disappeared). On a subsequent attempt, ATS has likely fully exited, so the
diagnostic output will report "No [TS_MAIN] processes found at all." rather
than explaining that the process disappeared mid-read.
Consider distinguishing between the "process not found" case (return 1) and
"process disappeared mid-inspection" case with a different return code so that
the retry message and diagnostics are accurate.
```suggestion
return 12
```
##########
tests/gold_tests/thread_config/check_threads.py:
##########
@@ -20,26 +20,45 @@
import psutil
import argparse
import sys
+import time
def count_threads(ts_path, etnet_threads, accept_threads, task_threads,
aio_threads):
for p in psutil.process_iter(['name', 'cwd', 'threads']):
+ # Use cached info from process_iter attrs to avoid race conditions
+ # where the process exits between iteration and inspection.
+ try:
+ proc_name = p.info.get('name', '')
+ proc_cwd = p.info.get('cwd', '')
+ except (psutil.NoSuchProcess, psutil.AccessDenied):
+ continue
+
# Find the pid corresponding to the ats process we started in autest.
# It needs to match the process name and the binary path.
# If autest can expose the pid of the process this is not needed
anymore.
- if p.name() == '[TS_MAIN]' and p.cwd() == ts_path:
+ if proc_name == '[TS_MAIN]' and proc_cwd == ts_path:
etnet_check = set()
accept_check = set()
task_check = set()
aio_check = set()
- for t in p.threads():
+ try:
+ threads = p.threads()
+ except (psutil.NoSuchProcess, psutil.AccessDenied):
+ sys.stderr.write(f'Process {p.pid} disappeared while reading
threads.\n')
+ return 1
+
+ for t in threads:
- # Get the name of the thread.
- thread_name = psutil.Process(t.id).name()
+ # Get the name of the thread. The thread may have exited
+ # between p.threads() and this call, so handle that.
+ try:
+ thread_name = psutil.Process(t.id).name()
+ except (psutil.NoSuchProcess, psutil.AccessDenied):
+ continue
Review Comment:
When a thread exits between `p.threads()` and the per-thread
`psutil.Process(t.id).name()` call, the thread is silently skipped via
`continue` at line 61. This means that thread is not added to the relevant
check set (e.g., `etnet_check`, `task_check`). When the final size checks run
(lines 110–121), the set will have fewer entries than expected, triggering a
"wrong count" error with exit codes 4, 6, 9, or 11.
The retry logic in `main()` only retries on exit code 1 ("process not
found"). Exit codes 4, 6, 9, or 11 are treated as definitive failures and not
retried. This creates a scenario where a transient TOCTOU race causes a
non-retried definitive failure, which is worse than the original behavior.
One option is to also retry on these exit codes, or alternatively, to count
the number of skipped threads and fall back to exit code 1 (to trigger a retry)
if any threads were skipped.
```suggestion
sys.stderr.write(f'Thread {t.id} disappeared while
reading name.\n')
return 1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]