JuJinPark commented on issue #3221: URL: https://github.com/apache/hertzbeat/issues/3221#issuecomment-2804792782
@tomsun28 This is one possible suggestion for a new architecture <img width="1149" alt="Image" src="https://github.com/user-attachments/assets/8f744f7d-4df1-485b-b95d-822c81bfd819" /> ### Brief Explanation 1. Shared Storage (e.g., Redis): Register active collectors and their heartbeats Maintain a consistent hash ring (updated when collector list changes) 2. Queues (e.g., Kafka or Redis) for Communication: JobDispatchQueue: Distribute collection jobs (1 topic/partition per collector) MetricJobQueue: Receive collected metric results 3. Manager Becomes Stateless: Periodically reads the collector list and hash ring from shared storage Assigns jobs by publishing to the appropriate queue Pulls results, runs alarming logic, and writes to history DB 4. Collector Becomes Async Worker: Subscribes to its own topic/queue Pulls jobs, collects metrics, sends results to MetricJobQueue ### Considerations - Ensure consistent hash ring state across Manager instances - Handle possible job duplication or misrouting during failover or rebalancing - Adds infrastructure complexity (Kafka, Redis need to be highly available) - Potential increase in end-to-end latency compared to current push model Please treat this as just one suggestion to start a discussion. 😄 I understand that adopting a new architecture would require significant changes, testing, and collaboration from many contributors. Looking forward to hearing thoughts from your team and other contributors 🙌 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
