Re: [I] [PROPOSAL] Introduce native jitter metrics (computed in streaming) for Component and e2e stream stability analysis (storm)

via GitHub Sat, 02 May 2026 02:42:34 -0700


GGraziadei commented on issue #8583:
URL: https://github.com/apache/storm/issues/8583#issuecomment-4363520272


   Hello @rzo1 thank you for your comment.
   
   1. My ultimate goal is to move towards a self-regulating system, one capable 
of detecting congestion in real-time and dynamically modulating throughput 
across different stages, increasing the determinism of the flow transmission. 
Any robust control system requires a feedback loop, and while this is a complex 
goal that will likely be achieved in multiple steps, the absolute first 
requirement is a reliable, high-frequency control signal. I fully recognise 
that similar metrics can be derived from existing histograms, which are 
excellent for post-hoc analysis and human-readable dashboards. However, what I 
am proposing is a metric calculated on-the-fly, directly within the stream of 
transit. There is a fundamental difference in how these signals behave: 
histograms with reservoirs are inherently batch-oriented or sampling-based; 
they require a window of data to be collected before a statistically 
significant percentile can be computed. This introduces a lag that is often too 
high for re
 active control logic. RFC 1889 EWMA instead is calculated continuously (or 
with a very high sampling ratio). Every single tuple (or a high percentuage or 
tuples0 updates the state in constant time. This lack of sampling delay 
provides a reactive feedback signal that can detect micro-bursts or jitter the 
moment they occur. In short, while histograms tell us how the system performed, 
this EWMA signal tells us how the system is performing right now. This 
immediacy is what will eventually allow an executor to detect a bottleneck and 
propagate that signal upstream to stabilize ingestion and ensure a more 
deterministic flow.
   2. That is a valid point. Since jitter is often overloaded, I will 
explicitly label this as RFC 1889 Inter-arrival Jitter (definition A) in the 
documentation. By naming the metric precisely (e.g., jitter_rfc1889_a), we can 
clearly distinguish this reactive control signal from the descriptive 
statistics (P99 or StdDev) already provided by Dropwizard.
   3. I completely agree that the smoothing factor has to be a configurable 
parameter and not a constant. 
   4. According to point 1, the idea is to extend the BoltExecutor and 
SpoutExecutor to create two new classes (e.g. BoltDeterministicExecutor / 
SpoutDeterministicExecutor)
   5. That is a great point regarding selection bias. Including failed tuples 
is necessary for a complete picture, but I want to avoid polluting the jitter 
signal with the pathological latencies typical of timeouts. I propose splitting 
this into two distinct metrics: one for successful tuples and one for 
failed/timed-out tuples. This provides full visibility without compromising the 
precision of the primary control signal.
   6. I completely agree, it is required to add this limitation in the 
documentation. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [PROPOSAL] Introduce native jitter metrics (computed in streaming) for Component and e2e stream stability analysis (storm)

Reply via email to