Hi Apache team, At redBorder, we developed rb-druid-indexer as one of our internal services. The idea is to simplify indexing task submissions in distributed systems. rb-druid-indexer functions as a cluster-compatible service that manages indexing tasks. It also load-balances task submissions and deletions to the Druid router(s) and includes failover capabilities using ZooKeeper for coordination.
Currently, rb-druid-indexer supports multiple Kafka brokers, custom field dimensions, load balancing across Druid routers, excluded dimensions, and several other useful features. You can configure rb-druid-indexer using a YAML file like the example below: zookeeper_servers: - "rb-malvarez1.node:2181" - "rb-malvarez3.node:2181" - "rb-malvarez2.node:2181" tasks: - task_name: "rb_monitor" feed: "rb_monitor" spec: "rb_monitor" kafka_brokers: - "rb-malvarez1.node:9092" - "rb-malvarez3.node:9092" - "rb-malvarez2.node:9092" https://github.com/apache/druid-website-src/pull/530 Thx <3 :)