wu-sheng opened a new issue #2773: [Proposal] OAP Federation Mode
URL: https://github.com/apache/skywalking/issues/2773
 
 
   # Federation 
   ## Definition
   Federation is a new mode, SkyWalking OAP server could run. It is designed to 
support monitoring and aggregation metrics(no topology) across Clouds and 
Regions.
   
   ## Typical Scenario
   Service A could be deployed in C and N Clouds, the OPS team should set two 
OAP clusters to monitoring C and N Clouds, to get topology, metrics, traces, 
and alarm in both clouds.
   
   At the same time, OPS team wants to know the Service A overview metrics 
across the clouds, and set alarm based on that. This is the moment Federation 
works.
   
   ## Core
   Basically, Federation is just a particular mode of OAP server cluster, so it 
shares most of the codebases of current OAP, such as modulization, receiver, 
OAL, pluggable storage.
   
   The new of Federation OAP are as following
   ### `federation-forward` module and provider
   The new module and provider need to be added. SkyWalking recently has added 
`exporter` module and gRPC implementor, but it is the same as Federation.
   1. The existing `exporter` report metrics total number.
   1. The requirement of `Federation` is reporting increment of metrics, with 
the details(such as latency matrix of p99), not just the value. Because in 
Federation upstream, it should do the aggregation.
   
   Also, recommend to put `federation-forward` worker at 
`MetricsPersistentWorker#L102`, before query and do combine with db data. But 
here, we need a clone version of `metrics data`, to avoid concurrency 
manipulate.
   
   ### New Federation Receiver and Federation Protocol
   In Federation mode, SkyWalking OAP downstream cluster will talk with 
SkyWalking upstream cluster, then we need a new protocol(gRPC prefer) to report 
metrics with details.
   Also, in the protocol, we should consider extendable, because the metrics 
are generated by OAL function definition, such as `CPMMetrics`.
   
   ### New OAL source, scope, and function
   Because the existing functions are focusing on the aggregation of detail, 
the new functions need to be added to do aggregation of metrics. Such as how to 
aggregate `PercentMetrics`, we need a `PercentMetricsAdd` function to do so.
   
   ## Future
   Federation deployment could be multiple levels, such as 
   1. Set up federation for a region to support multiple clusters
   1. Set up second level Federation for a data center
   1. Set up third level Federation for the whole country.
   
   ## Robust and Performance
   Same as other design of SkyWalking, Federation forward just tries its best 
to deliver the metrics to upstream, no 100% guarantee. Federation forward could 
consider supporting MQ/Data file buffer to make it better.
   
   At stage 1, I prefer to do gRPC forward only, because even some data lost, 
it just lost several seconds metrics. But it should set up the extension 
points, like SPI or use module provider mechanism to make the extension easier.
   
   But at least, there should ba DataCarrier(blocking) queue in 
`federation-forward` to make sure gRPC stream mode works.
   
   ## TODO fix
   In `PersistenceTimer#L52`, the persistence execution interval is static, 
need to change that to configurable. 
   ____
   I look forward to receiving feedback about this new concept.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to