kingluo commented on issue #7949:
URL: https://github.com/apache/apisix/issues/7949#issuecomment-1277311643

   @hahayyum
   In `nginx-lua-prometheus` library, each metric instance maintains a lookup 
table to cache the full metric name of each label values combination.
   https://github.com/knyar/nginx-lua-prometheus/blob/master/prometheus.lua#L418
   
   The prometheus plugin would update metrics at the log phase of each request 
and in turn update the lookup table. Because you generates the route id in 
random string each time, so the lookup table increases indefinitely and causes 
memory leak:
   
https://github.com/apache/apisix/blob/6ae923a12fc1b7f8414b66efc053b19c7b5ddd08/apisix/plugins/prometheus/exporter.lua#L238
   
   Check my gist page for reproduce detail:
   https://gist.github.com/kingluo/1df4736665a3381ce745f26cf17181d8
   
   The systemtap script shows:
   
   ```
    T:@/opt/apisix//deps/share/lua/5.1/prometheus.lua:346
   @/opt/apisix//deps/share/lua/5.1/prometheus.lua:346
   @/opt/apisix//deps/share/lua/5.1/prometheus.lua:540
   @/opt/apisix/apisix/plugins/prometheus/exporter.lua:210
   @/opt/apisix/apisix/plugin.lua:884
   @/opt/apisix/apisix/plugin.lua:961
   @/opt/apisix/apisix/init.lua:314
   @/opt/apisix/apisix/init.lua:725
   =log_by_lua(nginx.conf:410):0
    : 0x0
           total 544320 bytes
    C:json_decode
   C:json_protect_conversion
   @/opt/apisix//deps/share/lua/5.1/resty/etcd/v3.lua:770
   @/opt/apisix/apisix/core/config_etcd.lua:123
   @/opt/apisix/apisix/core/config_etcd.lua:298
   @/opt/apisix/apisix/core/config_etcd.lua:538
   builtin#22
   @/opt/apisix/apisix/core/config_etcd.lua:516
    : 0xc
           total 358888 bytes
    builtin#177
   =coroutine_api:7
   @/opt/apisix//deps/share/lua/5.1/resty/http.lua:495
    : 0xb
           total 323136 bytes
   ```
   
   It points out the leak reason.
   
   **Note that the luajit gc does not free the unused memory immediately, so 
the output would include some memory footprint still in gc. We should check the 
codes and figure out which is fake memory leak.**
   
   **And also note that, because your url is in random string each time, so the 
pcre compiled lru cache would keep increasing util full of capacity. Those 
memory should be ignored too. That is, those memory would not increase after 
some times:**
   
   ```
    T:@/usr/local/openresty-debug/lualib/resty/core/regex.lua:559
   @/usr/local/openresty-debug/lualib/resty/core/regex.lua:559
   @/usr/local/openresty-debug/lualib/ngx/re.lua:133
   @/opt/apisix/apisix/admin/init.lua:119
   @/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:687
   @/opt/apisix/apisix/init.lua:803
   =content_by_lua(nginx.conf:234):0
    : 0x0
           756 samples
    T:@/usr/local/openresty-debug/lualib/resty/lrucache.lua:104
   @/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:491
   @/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:610
   @/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:629
   @/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:687
   @/opt/apisix/apisix/http/route.lua:104
   @/opt/apisix/apisix/init.lua:331
   =access_by_lua(nginx.conf:347):0
    : 0x0
           186 samples
    T:@/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:455
   @/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:491
   @/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:610
   @/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:629
   @/opt/apisix//deps/share/lua/5.1/resty/radixtree.lua:687
   @/opt/apisix/apisix/http/route.lua:104
   @/opt/apisix/apisix/init.lua:331
   =access_by_lua(nginx.conf:347):0
    : 0x0
           3 samples
   /
           756 samples
   \/http\/rtGjsw\/((.|\n)*)
           1 samples
   \/http\/JilThq\/((.|\n)*)
           1 samples
   ```
   
   Currently, the `nginx-lua-prometheus` library used by apisix is old. It will 
cause other errors in your scenario. I would post a bugfix and tell the author 
to publish a new release:
   
https://github.com/knyar/nginx-lua-prometheus/issues/150#issuecomment-1277219593
   
   The simple bugfix is to replace the whole lookup table with a new empty 
table when full:
   
   ```diff
   --- /opt/prometheus.lua.orig    2022-10-13 17:21:34.766568987 +0800
   +++ /opt/apisix//deps/share/lua/5.1/prometheus.lua      2022-10-13 
17:26:05.729765362 +0800
   @@ -353,6 +353,12 @@
        return nil, string.format("inconsistent labels count, expected %d, got 
%d",
                                  self.label_count, cnt)
      end
   +
   +  if self.lookup_size > self.lookup_max_size then
   +    self.lookup_size = 0
   +    self.lookup = {}
   +  end
   +
      local t = self.lookup
      if label_values then
        -- Don't use ipairs here to avoid inner loop generates trace first
   @@ -405,6 +411,9 @@
        full_name = full_metric_name(self.name, self.label_names, label_values)
      end
      t[LEAF_KEY] = full_name
   +
   +  self.lookup_size = self.lookup_size + 1
   +
      local err = self._key_index:add(full_name)
      if err then
        return nil, err
   @@ -791,6 +800,8 @@
        -- ['my.net']['200'][LEAF_KEY] = 
'http_count{host="my.net",status="200"}'
        -- ['my.net']['500'][LEAF_KEY] = 
'http_count{host="my.net",status="500"}'
        lookup = {},
   +    lookup_size = 0,
   +    lookup_max_size = 100,
        parent = self,
        -- Store a reference for logging functions for faster lookup.
        _log_error = function(...) self:log_error(...) end,
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to