On Fri, Dec 22, 2023 at 6:23 AM radek.krotil.siemens.com via users <
users@subversion.apache.org> wrote:

> Hi SVN Community.
>
>
>
> I’d like to consult a problem with occasional high CPU utilization by
> mod_authz_svn. Please, do CC me and my colleague in the response, because
> we are subscribed to the list.
>
>
>
> Our Java application uses Subversion as a primary data storage and the SVN
> data are being accessed via Apache HTTPD mod_dav_svn and also by svnserve
> server on top of the same repository.
>
>
>
> We are facing a problem in one of our productive environments hosted in
> AWS. The server is running RHEL 8.8, Apache 2.4.37 with SVN 1.14.1
> binaries. Occasionally, the Apache HTTPD is acting strange, utilizing 100%
> CPU for periods of minutes to tens of minutes. This situation happens
> several times a week and causes delayed responses from the SVN server up to
> several minutes. Similar behavior is visible on several httpd server
> processes (mpm_event module is used), where each of them has tens of active
> requests being processed. When this CPU spike occurs, there several Apache
> HTTPD server processes with tens of threads busy in the following code
> related to the Subversion ACL.
>
>
>
> The access file is quite big. It contains about 20000 group definitions
> with roughly 10000 users assigned to them and about 35000 authorization
> rules for 15000 paths, and it’s size is over 5 MB. The access file is
> getting changed 100 times a day, but the CPU spikes do not correlate with
> the CPU spikes.
>
>
>
> Initially we suspected filesystem issues, but we were not able to confirm
> them. When the spike occurs, CPU I/O wait goes to zero and all the CPU is
> consumed in user space.
>
>
>
> We are looking for suggestions how to further troubleshoot the problem, or
> possible confirmation that this is a Subversion bug or inefficiency.
>
>
>
> Thank you,
>
> Radek Krotil
>
> Siemens Digital Industries Software
>
>
>
>
>
> *Pstack output example*
>
> Thread 127 (Thread 0x7f0003fcf700 (LWP 2934729)):
>
>
>
> #0  0x00007f0049b79b60 in svn_authz.get_acl_access () from
> target:/lib64/libsvn_repos-1.so.0
>
> #1  0x00007f0049b7a864 in update_user_rights () from
> target:/lib64/libsvn_repos-1.so.0
>
> #2  0x00007f0048e35d06 in hash_do_callback () from
> target:/lib64/libsvn_subr-1.so.0
>
> #3  0x00007f0058bc8ebe in apr_hash_do () from target:/lib64/libapr-1.so.0
>
> #4  0x00007f0048e35d6a in svn_iter_apr_hash () from
> target:/lib64/libsvn_subr-1.so.0
>
> #5  0x00007f0049b7a7b4 in expand_acl_callback () from
> target:/lib64/libsvn_repos-1.so.0
>
> #6  0x00007f0048e35e41 in svn_iter_apr_array () from
> target:/lib64/libsvn_subr-1.so.0
>
> #7  0x00007f0049b7c109 in svn_authz.parse () from
> target:/lib64/libsvn_repos-1.so.0
>
> #8  0x00007f0049b78f74 in svn_repos_authz_read4 () from
> target:/lib64/libsvn_repos-1.so.0
>
> #9  0x00007f004845e4a3 in get_access_conf () from
> target:/etc/httpd/modules/mod_authz_svn.so
>
> #10 0x00007f004845e705 in req_check_access () from
> target:/etc/httpd/modules/mod_authz_svn.so
>
> #11 0x00007f004845efd6 in auth_checker () from
> target:/etc/httpd/modules/mod_authz_svn.so
>
> #12 0x0000561c8b0724c8 in ap_run_auth_checker ()
>
> #13 0x0000561c8b074a9c in ap_process_request_internal ()
>
> #14 0x0000561c8b093a70 in ap_process_async_request ()
>
> #15 0x0000561c8b08fed0 in ap_process_http_connection ()
>
> #16 0x0000561c8b0862b8 in ap_run_process_connection ()
>
> #17 0x00007f004d6b9a47 in process_socket () from
> target:/etc/httpd/modules/mod_mpm_event.so
>
> #18 0x00007f004d6ba3ea in worker_thread () from
> target:/etc/httpd/modules/mod_mpm_event.so
>
> #19 0x00007f005899a1ca in start_thread () from
> target:/lib64/libpthread.so.0
>
> #20 0x00007f0058402e73 in clone () from target:/lib64/libc.so.6
>
>
>
>
>
> *TOP output looks as follows*
>
> top - 13:00:01 up 15 days,  7:09,  0 users,  load average: 129.31, 100.24,
> 53.38
>
> Tasks: 241 total,   7 running, 233 sleeping,   0 stopped,   1 zombie
>
> %Cpu(s): 95.6 us,  3.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.7 hi,  0.0
> si,  0.0 st
>
> MiB Mem : 126914.0 total,  55179.2 free,  20765.8 used,  50969.0 buff/cache
>
> MiB Swap:      0.0 total,      0.0 free,      0.0 used. 104540.6 avail Mem
>
>
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
> COMMAND
>
>  233319 apache    20   0 5818164   2.9g  13844 S 456.2   2.3 183:35.14
> httpd
>
>  233318 apache    20   0 5751168   2.7g  13844 S 443.8   2.2 127:53.12
> httpd
>
>  232848 apache    20   0 5226028   2.0g  13844 S 218.8   1.6  46:08.83
> httpd
>
>  232849 apache    20   0 5291384   2.1g  13804 S 168.8   1.7  68:34.92
> httpd
>
>  232847 apache    20   0 5291628   1.7g  13908 S 137.5   1.4  39:14.61
> httpd
>
>
>
> *And Apache server status clearly shows the number of “stuck” requests*
>
> Apache Server Status for localhost (via ::1)
>
> Server Version: Apache/2.4.37 (Red Hat Enterprise Linux) OpenSSL/1.1.1k
> SVN/1.14.1
>
> Server MPM: event
>
> Server Built: Aug 17 2023 10:52:25
> ------------------------------
>
> Current Time: Monday, 18-Dec-2023 13:00:01 UTC
>
> Restart Time: Saturday, 16-Dec-2023 05:23:02 UTC
>
> Parent Server Config. Generation: 2
>
> Parent Server MPM Generation: 1
>
> Server uptime: 2 days 7 hours 36 minutes 58 seconds
>
> Server load: 129.31 100.24 53.38
>
> Total accesses: 212201 - Total Traffic: 1.9 GB - Total Duration: 67348142
>
> CPU Usage: u27647.8 s220.14 cu338.36 cs31.99 - 14.1% CPU load
>
> 1.06 requests/sec - 10.0 kB/second - 9.4 kB/request - 317.379 ms/request
>
> 112 requests currently being processed, 0 workers gracefully restarting,
> 138 idle workers
>
> *Slot*
>
> *PID*
>
> *Stopping*
>
> *Connections*
>
> *Threads*
>
> *Async connections*
>
> *total*
>
> *accepting*
>
> *busy*
>
> *graceful*
>
> *idle*
>
> *writing*
>
> *keep-alive*
>
> *closing*
>
> 0
>
> 232847
>
> no
>
> 12
>
> yes
>
> 11
>
> 0
>
> 39
>
> 0
>
> 0
>
> 1
>
> 1
>
> 232848
>
> no
>
> 11
>
> yes
>
> 11
>
> 0
>
> 39
>
> 0
>
> 0
>
> 0
>
> 3
>
> 232849
>
> no
>
> 15
>
> yes
>
> 15
>
> 0
>
> 35
>
> 0
>
> 0
>
> 0
>
> 5
>
> 233318
>
> no
>
> 40
>
> yes
>
> 37
>
> 0
>
> 13
>
> 0
>
> 0
>
> 2
>
> 6
>
> 233319
>
> no
>
> 37
>
> yes
>
> 38
>
> 0
>
> 12
>
> 0
>
> 0
>
> 0
>
> Sum
>
> 5
>
> 0
>
> 115
>
>
>
> 112
>
> 0
>
> 138
>
> 0
>
> 0
>
> 3
>
> W___W____W_____________WWW__W_______W__W__W_W_____W_W_____WW_W__
>
> W__W____W____________W______WW______............................
>
> ......................_W__W____W_W____WWW__WW_W____W___W_____WW_
>
> _W______..................................................WWWW__
>
> WWWWW_WWWW_WWW_WWWWWW_W_WWW__W_W_WWW__WWWWWWWWWWW__WWWW_WW_W_W_W
>
> WW__WW_WW_W_WWWWW_WWWWWWWWWWWW..................................
>
> ................................................................
>
> ................................................................
>
> ................................................................
>
> ................................................................
>
> ................................................................
>
> ................................................................
>
> ................................
>
> Scoreboard Key:
> "*_*" Waiting for Connection, "*S*" Starting up, "*R*" Reading Request,
> "*W*" Sending Reply, "*K*" Keepalive (read), "*D*" DNS Lookup,
> "*C*" Closing connection, "*L*" Logging, "*G*" Gracefully finishing,
> "*I*" Idle cleanup of worker, "*.*" Open slot with no current process
>
> *Srv*
>
> *PID*
>
> *Acc*
>
> *M*
>
> *CPU*
>
> *SS*
>
> *Req*
>
> *Dur*
>
> *Conn*
>
> *Child*
>
> *Slot*
>
> *Client*
>
> *Protocol*
>
> *VHost*
>
> *Request*
>
> *0-1*
>
> 232847
>
> 0/20/27
>
> *W*
>
> 1796.47
>
> 183
>
> 0
>
> 77128
>
> 0.0
>
> 0.25
>
> 0.28
>
> 10.X.X.239
>
> http/1.1
>
> xyz:443
>
> PROPFIND /repo/.polarion/user-management/users/…
>
> *0-1*
>
> 232847
>
> 0/23/32
>
> *W*
>
> 1761.15
>
> 201
>
> 0
>
> 190003
>
> 0.0
>
> 0.18
>
> 0.23
>
> 10.X.X.239
>
> http/1.1
>
> xyz:443
>
> GET /repo/!svn/bc/1731720/External/…
>
> *0-1*
>
> 232847
>
> 0/24/35
>
> *W*
>
> 1752.23
>
> 182
>
> 0
>
> 81788
>
> 0.0
>
> 0.31
>
> 0.36
>
> 10.X.X.239
>
> http/1.1
>
> xyz:443
>
> GET /repo/!svn/bc/1743998/External/…
>
>
>


Hello,

In order to help narrow down the cause:

You mentioned that SVN access is via both Apache HTTPD mod_dav_svn and
svnserve. Regarding the slow responses and 100% CPU utilization for
many minutes: Does this problem occur only when accessing SVN through
httpd? Does it ever occur when accessing SVN through svnserve?

Does it appear to affect all users equally?

Has this problem occurred consistently for a long period of time, or
rather is this a new problem that began recently? If it began
recently, was it sudden, or did it start small and grow worse over
time? Could the time the issue started be correlated with any factors
that might be significant? (e.g., significant change in size or
complexity of the authorization file, software upgrades, etc.)

In your pstack output example, you showed that execution was in
svn_authz.get_acl_access(). Has this consistently been the place where
execution is getting "stuck" for a long time? Or, rather, does it vary
through parts of the system?

(Regarding svn_authz.get_acl_access(), I studied this function a few
days ago and if I understand correctly, there may be an opportunity
for a local optimization there. I don't know how much impact that
would make -- it depends on how much time we're really spending there,
and it would be difficult to measure without reproducing all the
circumstances.)

In particular, it would be useful to know whether svnserve is affected
or only httpd/mod_dav_svn...

Thanks,
Nathan

Reply via email to