Hi Apache team,
At redBorder, we developed rb-druid-indexer as one of our internal
services. The idea is to simplify indexing task submissions in distributed
systems. rb-druid-indexer functions as a cluster-compatible service that
manages indexing tasks. It also load-balances task submissions and
deletions to the Druid router(s) and includes failover capabilities using
ZooKeeper for coordination.
Currently, rb-druid-indexer supports multiple Kafka brokers, custom field
dimensions, load balancing across Druid routers, excluded dimensions, and
several other useful features. You can configure rb-druid-indexer using a
YAML file like the example below:
zookeeper_servers:
- "rb-malvarez1.node:2181"
- "rb-malvarez3.node:2181"
- "rb-malvarez2.node:2181"
tasks:
- task_name: "rb_monitor"
feed: "rb_monitor"
spec: "rb_monitor"
kafka_brokers:
- "rb-malvarez1.node:9092"
- "rb-malvarez3.node:9092"
- "rb-malvarez2.node:9092"
https://github.com/apache/druid-website-src/pull/530
Thx <3 :)