rymarm opened a new issue, #2978: URL: https://github.com/apache/drill/issues/2978
One drillbit may end up with significantly more client connections than others. Here is an example of such a case:  The chart shows that the highest number of connections per node is 21, while the lowest is only 3 – a difference of 7 times. Even though query execution considers the load on all Drillbits and distributes execution accordingly, overutilizing a single Drillbit for client connections can lead to issues such as heap memory exhaustion and excessive network bandwidth usage when returning query results to the client. The current client implementation retrieves a list of active Drillbits from Zookeeper, randomly shuffles them: ```java java.util.Collections.shuffle(endpoints); ``` and attempts to connect to the first N nodes until a connection is successful. This approach was introduced as a simple way to prevent all clients from connecting to the same Drillbit: https://issues.apache.org/jira/browse/DRILL-2512. However, it does not fully solve the problem, as one node may still end up with significantly more connections than others. **Solution** Every drillbit stores the his current number of client connections in Zookeeper. After retrieving a list of active drillbits, the client calculates the average number of connections and reorders the list based on whether a drillbit's connection count is above or below the average before attempting a connection. The key questions I see here are: * Is it appropriate to store information about the current number of user connections in Zookeeper? * Should client connection load balancing be enabled be configurable and enabled by default? * Is it beneficial to distribute all connections evenly across the entire Drill cluster? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org