[
https://issues.apache.org/jira/browse/GUACAMOLE-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paweł Grabczak updated GUACAMOLE-2221:
--------------------------------------
Summary: RDP busy-loop on transport failure (was: Infinity loop on RDP
transport-layer fail)
> RDP busy-loop on transport failure
> ----------------------------------
>
> Key: GUACAMOLE-2221
> URL: https://issues.apache.org/jira/browse/GUACAMOLE-2221
> Project: Guacamole
> Issue Type: Bug
> Components: guacamole-server
> Affects Versions: 1.6.0
> Environment: FreeBSD 14.3
> Guacamole server 1.6.0
> Reporter: Paweł Grabczak
> Priority: Major
>
> Hello,
> I believe I've found a bug introduced (or at least exposed) after upgrading
> to Guacamole Server 1.6. I've been using guacd for a long time, and after
> upgrading from 1.4 to 1.6 I noticed an issue with the RDP protocol.
> When an RDP connection hits a transport-level failure (for example, a
> certificate-related error), the guacd process never exits and starts
> consuming 100% CPU. With a native FreeRDP client, the connection fails
> correctly and exits, and the client logs show:
> {code:java}
> ERRCONNECT_CONNECT_TRANSPORT_FAILED [0x0002000D]{code}
> With guacd, it looks like it enters an infinite loop. In guacd debug logs I
> see the following repeating continuously:
> {code:java}
> guacd[40267]: DEBUG: SVC "rdpdr" connected.
> guacd[40267]: DEBUG: SVC "rdpsnd" connected.
> guacd[40267]: DEBUG: Loading Dynamic Virtual Channel disp
> guacd[40267]: DEBUG: Loading Dynamic Virtual Channel rdpgfx
> guacd[40267]: DEBUG: Handler not found for ""
> guacd[40267]: DEBUG: Handler not found for ""
> guacd[40267]: DEBUG: Handler not found for ""
> guacd[40267]: DEBUG: BIO_read returned a system error 0: No error: 0
> guacd[40267]: DEBUG: transport_read_layer:freerdp_set_last_error_ex
> ERRCONNECT_CONNECT_TRANSPORT_FAILED [0x0002000D]
> guacd[40267]: DEBUG: Handler not found for ""
> guacd[40267]: DEBUG: Handler not found for ""
> guacd[40267]: DEBUG: Handler not found for ""
> guacd[40267]: DEBUG: Handler not found for ""
> guacd[40267]: DEBUG: Handler not found for ""{code}
> To understand what guacd was doing while spinning, I captured system calls.
> The trace shows repeated poll(..., 0) calls (zero timeout) and read()
> returning EAGAIN, which explains the tight loop and 100% CPU usage:
> {code:java}
> 19143: gettimeofday({ 1770992548.370728 },0x0) = 0 (0x0)
> 19143: gettimeofday({ 1770992548.370771 },0x0) = 0 (0x0)
> 19143: poll({ 13/POLLIN },1,0) = 0 (0x0)
> 19143: gettimeofday({ 1770992548.370840 },0x0) = 0 (0x0)
> 19143: gettimeofday({ 1770992548.370873 },0x0) = 0 (0x0)
> 19143: poll({ 5/POLLIN 9/POLLIN 86/POLLIN 14/POLLIN 12/POLLIN 13/POLLIN
> },6,0) = 1 (0x1)
>
> 19143: read(5,0x85b0fde18,8) ERR#35 'Resource
> temporarily unavailable'
> 19143: gettimeofday({ 1770992548.371020 },0x0) = 0 (0x0)
> 19143: gettimeofday({ 1770992548.371042 },0x0) = 0 (0x0)
> 19143: poll({ 13/POLLIN },1,0) = 0 (0x0)
> 19143: gettimeofday({ 1770992548.371094 },0x0) = 0 (0x0)
> 19143: gettimeofday({ 1770992548.371137 },0x0) = 0 (0x0)
> 19143: poll({ 5/POLLIN 9/POLLIN 86/POLLIN 14/POLLIN 12/POLLIN 13/POLLIN
> },6,0) = 1 (0x1)
>
> 19143: read(5,0x85b0fde18,8) ERR#35 'Resource
> temporarily unavailable'
> 19143: gettimeofday({ 1770992548.371255 },0x0) = 0 (0x0)
> ...{code}
> Root cause (as I understand it)
> From investigating the code, it looks like the error state in {{wait_result}}
> can be overwritten:
> * When the transport failure occurs, {{guac_rdp_handle_events()}} sets
> {{wait_result}} to {{{}-1{}}}.
> * However, the loop does not exit immediately.
> * In the next iteration, the {{while}} condition assigns {{wait_result}}
> from {{{}rdp_guac_client_wait_for_events(){}}}, which returns {{{}0{}}},
> effectively clearing the error flag.
> * As a result, the loop never observes the transport failure and continues
> polling, spinning the CPU.
> h3. Proposed fix
> I prepared a PR that breaks out of the loop immediately after
> {{guac_rdp_handle_events()}} returns an error, so we stop processing once
> FreeRDP reports a connection failure.
> The change fixes the issue for me, but I'm not sure whether it could
> introduce any regressions in other scenarios. I would appreciate a review and
> any suggestions for improvement.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)