becketqin commented on pull request #13920:
URL: https://github.com/apache/flink/pull/13920#issuecomment-723520262


   > A thought on performance: There are some metrics that use volatile 
variables.
   > 
   > So far we always avoided that. Even "infrequently accessed" fields, like 
watermarks, are quite frequent in some use cases.
   > And a single volatile write has a very significant overhead in a 
high-performance pipeline. It is the number one thing we try to avoid, because 
it breaks the processor pipeline and puts pressure on the memory bus and 
processor's cache coherency mechanism. For the metrics, performance (least 
impact on main data paths) should have clear precedence.
   > 
   > For metrics, we don't need a hard "happens before" relationship or any 
"immediate visibility". If the visibility is some 100s of milliseconds later 
(which would be a lot), it is no issue for the metrics at all, which are 
periodically reported every few seconds.
   > 
   > The case where the JIT inlines the variable and eliminates cross-thread 
visibility entirely is AFAIK not realistic here. And if we ever find that the 
metric gets not reported, we can still use an occasional volatile write (like 
we do occasional volatile exception checks, every 1000 records, in the network 
stack).
   
   Thanks for the comments and suggestions. I was actually struggling on this a 
little bit. The current implementation is based on the following assumptions:
   * The volatile read is usually as cheap as a normal variable read unless 
there is write contention. Modern CPUs can achieve a cheap read if there isn't 
a contention on the shared variable. The CPU cache line usually can cache the 
shared variable and just read from it. The reads only go to main memory if the 
value has been updated by another core, in that case the reading CPU's cache 
line is invalidated.
   * In the current implementation, the main thread performs a read on a 
volatile variable of `shouldUpdateCurrentEventTimeLag` for every record. The 
`shouldUpdateCurrentEventTimeLag` is set to true by a separate thread every 
second. If the main thread sees the value becomes true, it will update the 
volatile variable of `currentEmitEventTimeLag` with `System.currentTimeMillis() 
- timestamp`. So this is a volatile variable write and potentially a system 
call that the main thread performs once per second.
   * The `watermark` update is infrequent. From your comment this assumption 
seems to optimistic. So we can probably also update the `currentWatermark` once 
per second.
   
   Another part that took me some time to think about is whether we should use 
number-of-record-based interval v.s. time-based interval for metric reporting. 
In cases when there is a large throughput, these two approaches probably won't 
make any difference. But in case of low throughput, time-based-interval seems 
preferable. Imagine we emit record every 1000 records, but there is a low 
throughput stream that only processes one record every 5 seconds. In order to 
see the metrics get updated we need to wait for over an hour, which seems bad.
   
   I am curious do we have some experience that invalidates these assumptions?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to