Copilot commented on code in PR #12916:
URL: https://github.com/apache/trafficserver/pull/12916#discussion_r2875107964


##########
tests/gold_tests/thread_config/check_threads.py:
##########
@@ -20,26 +20,45 @@
 import psutil
 import argparse
 import sys
+import time
 
 
 def count_threads(ts_path, etnet_threads, accept_threads, task_threads, 
aio_threads):
 
     for p in psutil.process_iter(['name', 'cwd', 'threads']):
 
+        # Use cached info from process_iter attrs to avoid race conditions
+        # where the process exits between iteration and inspection.
+        try:
+            proc_name = p.info.get('name', '')
+            proc_cwd = p.info.get('cwd', '')
+        except (psutil.NoSuchProcess, psutil.AccessDenied):
+            continue

Review Comment:
   The `try/except (psutil.NoSuchProcess, psutil.AccessDenied)` block wrapping 
`p.info.get()` at lines 32–36 is incorrect. `p.info` is an ordinary Python 
dict; calling `.get()` on it never raises psutil exceptions. When 
`process_iter` can't fetch an attribute (e.g., `cwd` is denied), it stores the 
exception object as the dict value rather than raising it. As a result:
   1. The `except` clause never fires, giving a false sense of protection.
   2. `proc_cwd` may silently be an exception object instead of a string, which 
causes the `== ts_path` comparison to silently fail — harmless but misleading.
   
   The correct approach here is to check whether the value stored in `p.info` 
is an exception instance before using it, e.g., `proc_cwd = p.info.get('cwd', 
'') if not isinstance(p.info.get('cwd'), Exception) else ''`. Alternatively, 
for modern psutil versions (≥6), attribute errors are represented differently, 
so consult the psutil docs for the version in use. The `try/except` block can 
simply be removed since it provides no protection.



##########
tests/gold_tests/thread_config/check_threads.py:
##########
@@ -20,26 +20,45 @@
 import psutil
 import argparse
 import sys
+import time
 
 
 def count_threads(ts_path, etnet_threads, accept_threads, task_threads, 
aio_threads):
 
     for p in psutil.process_iter(['name', 'cwd', 'threads']):
 
+        # Use cached info from process_iter attrs to avoid race conditions
+        # where the process exits between iteration and inspection.
+        try:
+            proc_name = p.info.get('name', '')
+            proc_cwd = p.info.get('cwd', '')
+        except (psutil.NoSuchProcess, psutil.AccessDenied):
+            continue
+
         # Find the pid corresponding to the ats process we started in autest.
         # It needs to match the process name and the binary path.
         # If autest can expose the pid of the process this is not needed 
anymore.
-        if p.name() == '[TS_MAIN]' and p.cwd() == ts_path:
+        if proc_name == '[TS_MAIN]' and proc_cwd == ts_path:
 
             etnet_check = set()
             accept_check = set()
             task_check = set()
             aio_check = set()
 
-            for t in p.threads():
+            try:
+                threads = p.threads()
+            except (psutil.NoSuchProcess, psutil.AccessDenied):
+                sys.stderr.write(f'Process {p.pid} disappeared while reading 
threads.\n')
+                return 1

Review Comment:
   When a process disappears while reading threads (line 52), `count_threads` 
returns 1, and the retry logic in `main()` will retry because `result == 1`. 
However, the diagnostic message printed on line 51 only describes the 
disappearance — the retry message printed on line 161 then says "process not 
found, retrying in 2s..." which is misleading (the process was found but then 
disappeared). On a subsequent attempt, ATS has likely fully exited, so the 
diagnostic output will report "No [TS_MAIN] processes found at all." rather 
than explaining that the process disappeared mid-read.
   
   Consider distinguishing between the "process not found" case (return 1) and 
"process disappeared mid-inspection" case with a different return code so that 
the retry message and diagnostics are accurate.
   ```suggestion
                   return 12
   ```



##########
tests/gold_tests/thread_config/check_threads.py:
##########
@@ -20,26 +20,45 @@
 import psutil
 import argparse
 import sys
+import time
 
 
 def count_threads(ts_path, etnet_threads, accept_threads, task_threads, 
aio_threads):
 
     for p in psutil.process_iter(['name', 'cwd', 'threads']):
 
+        # Use cached info from process_iter attrs to avoid race conditions
+        # where the process exits between iteration and inspection.
+        try:
+            proc_name = p.info.get('name', '')
+            proc_cwd = p.info.get('cwd', '')
+        except (psutil.NoSuchProcess, psutil.AccessDenied):
+            continue
+
         # Find the pid corresponding to the ats process we started in autest.
         # It needs to match the process name and the binary path.
         # If autest can expose the pid of the process this is not needed 
anymore.
-        if p.name() == '[TS_MAIN]' and p.cwd() == ts_path:
+        if proc_name == '[TS_MAIN]' and proc_cwd == ts_path:
 
             etnet_check = set()
             accept_check = set()
             task_check = set()
             aio_check = set()
 
-            for t in p.threads():
+            try:
+                threads = p.threads()
+            except (psutil.NoSuchProcess, psutil.AccessDenied):
+                sys.stderr.write(f'Process {p.pid} disappeared while reading 
threads.\n')
+                return 1
+
+            for t in threads:
 
-                # Get the name of the thread.
-                thread_name = psutil.Process(t.id).name()
+                # Get the name of the thread. The thread may have exited
+                # between p.threads() and this call, so handle that.
+                try:
+                    thread_name = psutil.Process(t.id).name()
+                except (psutil.NoSuchProcess, psutil.AccessDenied):
+                    continue

Review Comment:
   When a thread exits between `p.threads()` and the per-thread 
`psutil.Process(t.id).name()` call, the thread is silently skipped via 
`continue` at line 61. This means that thread is not added to the relevant 
check set (e.g., `etnet_check`, `task_check`). When the final size checks run 
(lines 110–121), the set will have fewer entries than expected, triggering a 
"wrong count" error with exit codes 4, 6, 9, or 11.
   
   The retry logic in `main()` only retries on exit code 1 ("process not 
found"). Exit codes 4, 6, 9, or 11 are treated as definitive failures and not 
retried. This creates a scenario where a transient TOCTOU race causes a 
non-retried definitive failure, which is worse than the original behavior.
   
   One option is to also retry on these exit codes, or alternatively, to count 
the number of skipped threads and fall back to exit code 1 (to trigger a retry) 
if any threads were skipped.
   ```suggestion
                       sys.stderr.write(f'Thread {t.id} disappeared while 
reading name.\n')
                       return 1
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to