rhizoma-atractylodis commented on issue #5101: URL: https://github.com/apache/inlong/issues/5101#issuecomment-1198096296
## Motivation Based on my reading of the source code. Currently, the DataProxy SDK side selects DataProxy nodes using polling (sending messages in TCP mode) and random selection (sending messages in HTTP mode).The polling method is not efficient enough, and the random method is not easy to achieve load balancing. ## Changes Use consistent hashing algorithm instead of the original polling and random ## Mechanism Options Consistent Hash Algorithm and Virtual Node Mechanism [Refer to the article for details on the algorithm](https://blog.csdn.net/gonghaiyu/article/details/108375298) ## Design Based on my reading of the source code.The following are the functions that need to be modified: - org.apache.inlong.sdk.dataproxy.network.ClientMgr.getClientByRoundRobin():This function obtains the DataProxy node by polling - org.apache.inlong.sdk.dataproxy.http.InternalHttpSender.sendMessageWithHostInfo(List<String> bodies, String groupId, String streamId, long dt, long timeout, TimeUnit timeUnit):This function implements the selection of DataProxy nodes by randomly selecting HostInfo - Need to update the fields of the DataProxy node class to add information about virtual nodes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
