[GitHub] helix pull request #266: Propose design for aggregated cluster view service

kishoreg Sun, 21 Oct 2018 08:29:54 -0700

Github user kishoreg commented on a diff in the pull request:

    https://github.com/apache/helix/pull/266#discussion_r226866986
  
    --- Diff: designs/aggregated-cluster-view/design.md ---
    @@ -0,0 +1,353 @@
    +Aggregated Cluster View Design
    +==============================
    +
    +## Introduction
    +Currently Helix organize information by cluster - clusters are autonomous 
entities that holds resource / node information.
    +In real practice, a helix client might need to access aggregated 
information of helix clusters from different data center regions for management 
or coordination purpose.
    +This design proposes a service in Helix ecosystem for clients to retrieve 
cross-datacenter information in a more efficient way. 
    +
    +
    +## Problem Statement
    +We identified a couple of use cases for accessing cross datacenter 
information. [Ambry](https://github.com/linkedin/ambry) is one of them.
    +Here is a simplified example: some service has Helix cluster "MyDBCluster" 
in 3 data centers respectively, and each cluster has a resource named "MyDB".
    +To federate this "MyDBCluster", current usage is to have each federation 
client (usually Helix spectator) to connect to metadata store endpoints in all 
fabrics to retrieve information and aggregate them locally.
    +Such usge has the following drawbacks:
    +
    +* As there are a lot of clients in each DC that need cross-dc information, 
there are a lot of expensive cross-dc traffics
    +* Every client needs to know information about metadata stores in all 
fabrics which
    +  * Increases operational cost when these information changes
    +  * Increases security concern by allowing cross data center traffic
    +
    +To solve the problem, we have the following requirements:
    +* Clients should still be able to GET/WATCH aggregated information from 1 
or more metadata stores (likely but not necessarily from different data centers)
    +* Cross DC traffic should be minimized
    +* Reduce amount of information about data center that a client needs
    +* Agility of information aggregation can be configured
    +* Currently, it's good enough to have only LiveInstance, InstanceConfig, 
and ExternalView aggregated
    +
    +
    +
    +
    +
    +## Proposed Design
    +
    +To provide aggregated cluster view, the solution I'm proposing is to add a 
special type of cluster, i.e. **View Cluster**.
    +View cluster leverages current Helix semantics to store aggregated 
information of various **Source Clusters**.
    +There will be another micro service (Helix View Aggregator) running, 
fetching information from clusters (likely from other data centers) to be 
aggregated, and store then to the view cluster.
    --- End diff --
    
    why cant we just set up zookeeper observers?

---

[GitHub] helix pull request #266: Propose design for aggregated cluster view service

Reply via email to