becketqin commented on pull request #13920: URL: https://github.com/apache/flink/pull/13920#issuecomment-723520262
> A thought on performance: There are some metrics that use volatile variables. > > So far we always avoided that. Even "infrequently accessed" fields, like watermarks, are quite frequent in some use cases. > And a single volatile write has a very significant overhead in a high-performance pipeline. It is the number one thing we try to avoid, because it breaks the processor pipeline and puts pressure on the memory bus and processor's cache coherency mechanism. For the metrics, performance (least impact on main data paths) should have clear precedence. > > For metrics, we don't need a hard "happens before" relationship or any "immediate visibility". If the visibility is some 100s of milliseconds later (which would be a lot), it is no issue for the metrics at all, which are periodically reported every few seconds. > > The case where the JIT inlines the variable and eliminates cross-thread visibility entirely is AFAIK not realistic here. And if we ever find that the metric gets not reported, we can still use an occasional volatile write (like we do occasional volatile exception checks, every 1000 records, in the network stack). Thanks for the comments and suggestions. I was actually struggling on this a little bit. The current implementation is based on the following assumptions: * The volatile read is usually as cheap as a normal variable read unless there is write contention. Modern CPUs can achieve a cheap read if there isn't a contention on the shared variable. The CPU cache line usually can cache the shared variable and just read from it. The reads only go to main memory if the value has been updated by another core, in that case the reading CPU's cache line is invalidated. * In the current implementation, the main thread performs a read on a volatile variable of `shouldUpdateCurrentEventTimeLag` for every record. The `shouldUpdateCurrentEventTimeLag` is set to true by a separate thread every second. If the main thread sees the value becomes true, it will update the volatile variable of `currentEmitEventTimeLag` with `System.currentTimeMillis() - timestamp`. So this is a volatile variable write and potentially a system call that the main thread performs once per second. * The `watermark` update is infrequent. From your comment this assumption seems to optimistic. So we can probably also update the `currentWatermark` once per second. Another part that took me some time to think about is whether we should use number-of-record-based interval v.s. time-based interval for metric reporting. In cases when there is a large throughput, these two approaches probably won't make any difference. But in case of low throughput, time-based-interval seems preferable. Imagine we emit record every 1000 records, but there is a low throughput stream that only processes one record every 5 seconds. In order to see the metrics get updated we need to wait for over an hour, which seems bad. I am curious do we have some experience that invalidates these assumptions? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
