michael-s-molina commented on PR #34826:
URL: https://github.com/apache/superset/pull/34826#issuecomment-3849113231

   > So I think we need to revisit the architecture to come up with a design 
where we can coordinate connection pooling across the whole worker fleet.
   
   Agree with @villebro. To illustrate the problem in more detail:
   
                                                                              
                                                                                
                                    
   ####  1. Connection Pool Multiplication                                      
                                        
                                                                                
                                    
     In `SINGLETON` mode, each pod/Celery worker has its own `EngineManager` 
instance:                                  
                                                                                
                                    
     Pod 1: EngineManager → Pool(size=5) → 5 connections                        
                                    
     Pod 2: EngineManager → Pool(size=5) → 5 connections                        
                                    
     Pod 3: EngineManager → Pool(size=5) → 5 connections                        
                                    
     ...                                                                        
                                    
                                                                                
                                    
     With 10 pods and pool size 5, you could hit 50 connections to the 
database, not 5. This can easily exhaust     
     database connection limits, especially for managed databases with strict 
limits.                               
                                                                                
                                    
     #### 2. SSH Tunnel Multiplication                                          
                                         
                                                                                
                                    
     Same problem with SSH tunnels:                                             
                                    
                                                                                
                                    
   ```
   # Each pod creates its own tunnel cache   
   self._tunnels: dict[TunnelKey, SSHTunnelForwarder] = {}   
   ```                                                     
                                                                                
                                    
     If you have 20 Celery workers across multiple pods, you could have 20 
separate SSH tunnels to the same bastion 
     host. Many SSH servers limit concurrent connections per user (often 10 by 
default).                            
                                                                                
                                    
     #### 3. Celery Prefork Workers Make It Worse                               
                                         
                                                                                
                                    
     Celery's default prefork model spawns multiple worker processes:           
                                    
                                                                                
                                    
     Pod 1:                                                                     
                                    
       └── Celery Worker (prefork, concurrency=4)                               
                                    
            ├── Process 1 → EngineManager → Pool                                
                                    
            ├── Process 2 → EngineManager → Pool                                
                                    
            ├── Process 3 → EngineManager → Pool                                
                                    
            └── Process 4 → EngineManager → Pool                                
                                    
                                                                                
                                    
     Each forked process gets its own memory space, so connection pools aren't 
shared even within a single pod.     
                                                                                
                                    
    #### 4. No Cluster-Wide Coordination                                        
                                        
                                                                                
                                    
     The cache keys are deterministic but there's no coordination:              
                                    
                                                                                
                                    
     ```
   # manager.py:258
     engine_key = self._get_engine_key(database, catalog, schema, source, 
user_id)
   ```                              
                                                                                
                                    
     Each process independently decides to create connections without knowing 
cluster-wide state.                   
                                                                                
                                    
    #### What's Missing for True Connection Pooling                             
                                        
                                                                                
                                    
     For distributed connection pooling, you'd typically need:                  
                                    
                                                                                
                                    
     1. External connection pooler like PgBouncer, ProxySQL, or cloud-native 
poolers (RDS Proxy, Cloud SQL Proxy)   
     2. Centralized state via Redis or similar for tunnel coordination          
                                    
     3. Connection limits per-pod that account for cluster size 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to