ema opened a new issue #7423:
URL: https://github.com/apache/trafficserver/issues/7423
While investigating suspiciously high system CPU usage on a few servers
running ATS, we observed that the traffic_server process was causing more than
50 thousand **mmap** calls per second, all of which with length 65536.
```
# pid 31316 is traffic_server
$ sudo bpftrace -e 'kprobe:do_mmap /pid == 31316/ { @[arg2]=count(); }
interval:s:1 { exit(); }'
Attaching 2 probes...
@[65536]: 64988
```
By looking at LuaJIT's source code we found that the function
**mcode_alloc_at** is responsible for calling mmap, so we obtained a few
backtraces for that function with the following SystemTap script:
```
probe
process("/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0").function("mcode_alloc_at")
{
print_ubacktrace()
exit()
}
```
The script must be executed as follows to properly include all symbols:
```
$ sudo stap -v -d /usr/bin/traffic_server -d
/usr/lib/trafficserver/modules/tslua.so mcode_alloc_at.stp
```
Most backtraces looked either like this:
```
0x7fb44df01338 : mcode_allocarea+0xb8/0x160
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df19d54 : lj_asm_trace+0x25b4/0x3460
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df381d1 : trace_state+0x201/0x1340
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def2146 : lj_vm_cpcall+0x4a/0x5e
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def864b : lj_trace_ins+0x5b/0x70
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df410dc : lj_dispatch_call+0x21c/0x250
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def37ba : lj_vm_hotcall+0x1e/0x43
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df3f22c : lua_pcall+0x5c/0x80
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb52800d9a3 : globalHookHandler+0x313/0x450
[/usr/lib/trafficserver/modules/tslua.so]
```
Or like this:
```
0x7fb44df01338 : mcode_allocarea+0xb8/0x160
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df19d54 : lj_asm_trace+0x25b4/0x3460
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df381d1 : trace_state+0x201/0x1340
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def2146 : lj_vm_cpcall+0x4a/0x5e
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def864b : lj_trace_ins+0x5b/0x70
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df34e17 : lj_dispatch_ins+0x177/0x1e0
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def3741 : lj_vm_inshook+0x2d/0x44
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44df48ec1 : lj_cf_string_gsub+0x2e1/0x440
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb44def1dd7 : lj_BC_FUNCC+0x34/0x59
[/usr/lib/x86_64-linux-gnu/libluajit-5.1.so.2.1.0]
0x7fb52801a0fe : ts_lua_http_cont_handler+0x15e/0x410
[/usr/lib/trafficserver/modules/tslua.so]
```
I think the former traces are from tslua used as a global plugin, the latter
for remap rules.
Almost all our servers were affected by this, with the exception of a
handful were traffic_server almost never called **mmap**. We do not have any
valid theory explaining the difference, given that the servers run the same
software stack and configuration.
However, the discrepancy allowed us to observe that on systems **not**
calling mmap, the various Lua JIT trace calls (eg: **lj_trace_ins**) were also
not called. We hence tried to disable the JIT compiler and only rely on the
interpreter by adding **jit.off(true, true)** to all our Lua scripts. The
result was that indeed mmap stopped being called, system CPU usage went down
significantly, and the performance of ATS increased visibly.

We think that what might be happening is that the Lua JIT compiler keeps on
identifying hot code and compiling it, instead of retaining and reusing the
built code, or something along those lines. It would be great to:
- better understand why the compiler is kicking in so often
- have a configuration setting to turn off the compiler, similarly to what
luajit's **-joff** does: https://luajit.org/running.html
In **plugin.config** we have:
```
tslua.so /etc/trafficserver/lua/default.lua --enable-reload
```
**remap.config**: https://phabricator.wikimedia.org/P13769
All lua scripts
[here](https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/ab6958815d4d346ecfcca28724cd6d91a33d9eaf/modules/profile/files/trafficserver/).
Our bug tracking system has [further details on the
investigation](https://phabricator.wikimedia.org/T265625).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]