Paweł Grabczak created GUACAMOLE-2221:
-----------------------------------------

             Summary: Infinity loop on RDP transport-layer fail
                 Key: GUACAMOLE-2221
                 URL: https://issues.apache.org/jira/browse/GUACAMOLE-2221
             Project: Guacamole
          Issue Type: Bug
          Components: guacamole-server
    Affects Versions: 1.6.0
         Environment: FreeBSD 14.3
Guacamole server 1.6.0
            Reporter: Paweł Grabczak


Hello,

I believe I've found a bug introduced (or at least exposed) after upgrading to 
Guacamole Server 1.6. I've been using guacd for a long time, and after 
upgrading from 1.4 to 1.6 I noticed an issue with the RDP protocol.

When an RDP connection hits a transport-level failure (for example, a 
certificate-related error), the guacd process never exits and starts consuming 
100% CPU. With a native FreeRDP client, the connection fails correctly and 
exits, and the client logs show:
{code:java}
ERRCONNECT_CONNECT_TRANSPORT_FAILED [0x0002000D]{code}
With guacd, it looks like it enters an infinite loop. In guacd debug logs I see 
the following repeating continuously:
{code:java}
 guacd[40267]: DEBUG:    SVC "rdpdr" connected.                           
 guacd[40267]: DEBUG:    SVC "rdpsnd" connected.              
 guacd[40267]: DEBUG:    Loading Dynamic Virtual Channel disp            
 guacd[40267]: DEBUG:    Loading Dynamic Virtual Channel rdpgfx
 guacd[40267]: DEBUG:    Handler not found for ""                          
 guacd[40267]: DEBUG:    Handler not found for ""                          
 guacd[40267]: DEBUG:    Handler not found for ""                          
 guacd[40267]: DEBUG:    BIO_read returned a system error 0: No error: 0
 guacd[40267]: DEBUG:    transport_read_layer:freerdp_set_last_error_ex 
ERRCONNECT_CONNECT_TRANSPORT_FAILED [0x0002000D]
 guacd[40267]: DEBUG:    Handler not found for ""
 guacd[40267]: DEBUG:    Handler not found for ""
 guacd[40267]: DEBUG:    Handler not found for ""
 guacd[40267]: DEBUG:    Handler not found for ""
 guacd[40267]: DEBUG:    Handler not found for ""{code}
To understand what guacd was doing while spinning, I captured system calls. The 
trace shows repeated poll(..., 0) calls (zero timeout) and read() returning 
EAGAIN, which explains the tight loop and 100% CPU usage:
{code:java}
  19143: gettimeofday({ 1770992548.370728 },0x0)   = 0 (0x0)                
  19143: gettimeofday({ 1770992548.370771 },0x0)   = 0 (0x0)                
  19143: poll({ 13/POLLIN },1,0)                   = 0 (0x0)                
  19143: gettimeofday({ 1770992548.370840 },0x0)   = 0 (0x0)                
  19143: gettimeofday({ 1770992548.370873 },0x0)   = 0 (0x0)
  19143: poll({ 5/POLLIN 9/POLLIN 86/POLLIN 14/POLLIN 12/POLLIN 13/POLLIN 
},6,0) = 1 (0x1)                                                                
  19143: read(5,0x85b0fde18,8)                     ERR#35 'Resource temporarily 
unavailable'                   
  19143: gettimeofday({ 1770992548.371020 },0x0)   = 0 (0x0)                
  19143: gettimeofday({ 1770992548.371042 },0x0)   = 0 (0x0)                
  19143: poll({ 13/POLLIN },1,0)                   = 0 (0x0)                
  19143: gettimeofday({ 1770992548.371094 },0x0)   = 0 (0x0)                
  19143: gettimeofday({ 1770992548.371137 },0x0)   = 0 (0x0)
  19143: poll({ 5/POLLIN 9/POLLIN 86/POLLIN 14/POLLIN 12/POLLIN 13/POLLIN 
},6,0) = 1 (0x1)                                                                
  19143: read(5,0x85b0fde18,8)                     ERR#35 'Resource temporarily 
unavailable'                   
  19143: gettimeofday({ 1770992548.371255 },0x0)   = 0 (0x0)                
...{code}
Root cause (as I understand it)

>From investigating the code, it looks like the error state in {{wait_result}} 
>can be overwritten:
 * When the transport failure occurs, {{guac_rdp_handle_events()}} sets 
{{wait_result}} to {{{}-1{}}}.

 * However, the loop does not exit immediately.

 * In the next iteration, the {{while}} condition assigns {{wait_result}} from 
{{{}rdp_guac_client_wait_for_events(){}}}, which returns {{{}0{}}}, effectively 
clearing the error flag.

 * As a result, the loop never observes the transport failure and continues 
polling, spinning the CPU.

h3. Proposed fix

I prepared a PR that breaks out of the loop immediately after 
{{guac_rdp_handle_events()}} returns an error, so we stop processing once 
FreeRDP reports a connection failure.

The change fixes the issue for me, but I'm not sure whether it could introduce 
any regressions in other scenarios. I would appreciate a review and any 
suggestions for improvement.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to