GGraziadei commented on issue #8583: URL: https://github.com/apache/storm/issues/8583#issuecomment-4363520272
Hello @rzo1 thank you for your comment. 1. My ultimate goal is to move towards a self-regulating system, one capable of detecting congestion in real-time and dynamically modulating throughput across different stages, increasing the determinism of the flow transmission. Any robust control system requires a feedback loop, and while this is a complex goal that will likely be achieved in multiple steps, the absolute first requirement is a reliable, high-frequency control signal. I fully recognise that similar metrics can be derived from existing histograms, which are excellent for post-hoc analysis and human-readable dashboards. However, what I am proposing is a metric calculated on-the-fly, directly within the stream of transit. There is a fundamental difference in how these signals behave: histograms with reservoirs are inherently batch-oriented or sampling-based; they require a window of data to be collected before a statistically significant percentile can be computed. This introduces a lag that is often too high for re active control logic. RFC 1889 EWMA instead is calculated continuously (or with a very high sampling ratio). Every single tuple (or a high percentuage or tuples0 updates the state in constant time. This lack of sampling delay provides a reactive feedback signal that can detect micro-bursts or jitter the moment they occur. In short, while histograms tell us how the system performed, this EWMA signal tells us how the system is performing right now. This immediacy is what will eventually allow an executor to detect a bottleneck and propagate that signal upstream to stabilize ingestion and ensure a more deterministic flow. 2. That is a valid point. Since jitter is often overloaded, I will explicitly label this as RFC 1889 Inter-arrival Jitter (definition A) in the documentation. By naming the metric precisely (e.g., jitter_rfc1889_a), we can clearly distinguish this reactive control signal from the descriptive statistics (P99 or StdDev) already provided by Dropwizard. 3. I completely agree that the smoothing factor has to be a configurable parameter and not a constant. 4. According to point 1, the idea is to extend the BoltExecutor and SpoutExecutor to create two new classes (e.g. BoltDeterministicExecutor / SpoutDeterministicExecutor) 5. That is a great point regarding selection bias. Including failed tuples is necessary for a complete picture, but I want to avoid polluting the jitter signal with the pathological latencies typical of timeouts. I propose splitting this into two distinct metrics: one for successful tuples and one for failed/timed-out tuples. This provides full visibility without compromising the precision of the primary control signal. 6. I completely agree, it is required to add this limitation in the documentation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
