c-taylor opened a new issue #7378:
URL: https://github.com/apache/trafficserver/issues/7378


   I wanted to share some really unusual system calls on ET_AIO threads. I 
think it's odd enough to warrant further investigation as the cause/impact is 
uncertain.
   
   When looking at system calls on a running system, even with light or no 
activity, the following can be observed.
   
   ```
   grep "ETIMEDOUT (Connection timed out)" strace.out |head
   34908 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010099>
   34909 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010179>
   34910 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010072>
   34911 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010143>
   34912 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010094>
   ```
   The interesting thing about all of the PIDs is that they are ET_AIO threads, 
which in theory manage no connections.
   
   ## Reproduction
   
   Tested version: 8.1.x
   
   Start traffic_server by hand:
   `strace -fT -o strace.out traffic_manager &`
   
   No 'load' is required, but I all sent through about 4-5 requests that would 
typically cache.
   strace capture and traffic_(server|manager) stopped after 60-120s.
   
   ### Find ET_AIO threads
   e.g. `grep "PR_SET_NAME, \"\[ET_AIO" strace.out |awk '{print $1}' |sort -n 
|uniq -c`
   
   ```
   34803 clone(child_stack=0x7fe9109fdf30, 
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
 parent_tidptr=0x7fe9109fe9d0, tls=0x7fe9109fe700, child_tidptr=0x7fe9109fe9d0) 
= 34908 <0.000046>
   34803 mmap(NULL, 2097152, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0 <unfinished ...>
   34908 set_robust_list(0x7fe9109fe9e0, 24) = 0 <0.000014>
   34803 <... mmap resumed>)               = 0x7fe90e000000 <0.000038>
   34908 prctl(PR_SET_NAME, "[ET_AIO 1:131]") = 0 <0.000011>
   34908 mmap(NULL, 2097152, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7fe90de00000 <0.000057>
   34908 set_mempolicy(MPOL_INTERLEAVE, [0x0000000000000001], 65) = 0 <0.000012>
   34908 futex(0x7fe90ea631ac, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
2, {tv_sec=1605182243, tv_nsec=587387395}, 0xffffffff <unfinished ...>
   ```
   
   ### Find "Connection Timed Out"
   ```
   $ grep "ETIMEDOUT (Connection timed out)" strace.out |head
   34908 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010099>
   34909 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010179>
   34910 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010072>
   34911 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010143>
   34912 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010094>
   34913 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010080>
   34907 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010025>
   34914 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010057>
   34916 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010067>
   34917 <... futex resumed>)              = -1 ETIMEDOUT (Connection timed 
out) <0.010072>
   ```
   
   ## Observations
   In my own tests there is near perfect alignment on PIDs regularly reporting 
"Connection timed out" and ET_AIO threads.
   They also don't appear to subside, but continue even on long running, near 
idle systems.
   
   * What is the origin of these messages?
   * Why is ET_AIO 'timing out' any connections?
   * Did we clone more than intended?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to