drcrallen opened a new pull request #6799: Druid watermark initial contribution URL: https://github.com/apache/incubator-druid/pull/6799 This extension is a self-contained new service which tracks the "watermark" progression of data through a Druid deployment. The original author was @clintropolis . <img width="1267" alt="screen shot 2019-01-02 at 3 25 46 pm" src="https://user-images.githubusercontent.com/8213081/50620166-ddd1ea00-0eb2-11e9-9149-ddd1ace496fd.png"> A "watermark" is some indicator for how far along or complete data is in event time. This particular implementation comes with a few interesting ways to be able to monitor when data processing has progressed along certain stages. # stable The `stable_*` series of watermarks indicate when data is believed to be contiguous for a set of time. An example of which is when real-time tasks have handed off all segments to historical nodes for a particular series of time. # batch The `batch_*` series of watermarks indicate when data has been filled by a batch job. The detector for this watermark series is kind of hacky since it just looks for partition specs which are commonly created from batch jobs. # low The `*_low` series of watermarks indicates the lowest *contiguous* timestamp. It will not skip over gaps in data unless manually overridden. # high the `*_high` series of watermarks indicates the *furthest along in time* timestamp. So looking in history behind a `*_low` should yield contiguous data until `mintime`, and any split between `*_low` and `*_high` is a good indicator for missing data. We've been using this extension in production for quite a while now and it has worked pretty well.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
