Hello, I work closely with druid.io and one of the main pain points for any Druid deployment is handling the real-time streaming component. What I, personally, would *like* to have is the streaming orchestration and streaming state handled by a runner which specializes in such things, and allow Druid to focus on the lightning fast ad-hoc query side.
A natural contender for such a setup would be a Beam based solution with a Druid segment Sink. The trouble we are having pursuing such a setup is two fold: 1. Many Druid setups run lambda-style pipes to backfill late or wrong data, so the sink needs to call out to the Druid cluster for data version locking and orchestration. Access must be available from the sink to the Druid Overlord and/or Coordinator, or potentially some other task-specific jvm in the cluster. 2. Druid segments are queryable while they are being built. This means that the Druid cluster (Broker specifically) must be able to discover and issue *RPC queries* against the Sink on the partial data it has accumulated. This puts an extra dynamic load on the Sink jvm, but in my own experience that extra load is small compared to the load of indexing the incoming data. The key desire to use a stream-native framework is that the Druid MiddleManager/Peon setup (the streaming task infrastructure for Druid) has problems recovering from failure, recovering from upgrade easily, handling late data well, and dynamic horizontal scaling. The Kafka Indexing Extension <http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html> handles some of these things, but still doesn't quite operate like a stream-native solution, and is only available for Kafka. What is not clear to me is if such a feature set would be in the intended scope of Beam as opposed to having some other streaming data catcher service that Beam simply feeds into (aka, a Tranquility <https://github.com/druid-io/tranquility> sink). Having a customizable, accessible and stateful sink with good commit semantics seems like it should be something Beam could support natively, but the "accessible" part is where I need some guidance on if that is in scope for Beam. Can the dev list provide some insight to if there are any other Sinks that strive to be accessible at run-time from non-Beam components? Also, is such a use case something that is desired to be a part of what Beam does, or would it be best outside of Beam? Thank you, Charles Allen