It's better to use observers since the replication is timeline consistent
i.e changes are seen in the same order as they happened on the originating
cluster. Achieving correctness is easier with observer model. I agree that
we might have to replicate changes we don't care but changes to ZK are
multiple orders of magnitude smaller than replicating a database.

You can still have the aggregation logic as part of the client library.


On Tue, Oct 23, 2018 at 2:02 PM zhan849 <[email protected]> wrote:

> Github user zhan849 commented on a diff in the pull request:
>
>     https://github.com/apache/helix/pull/266#discussion_r227562948
>
>     --- Diff: designs/aggregated-cluster-view/design.md ---
>     @@ -0,0 +1,353 @@
>     +Aggregated Cluster View Design
>     +==============================
>     +
>     +## Introduction
>     +Currently Helix organize information by cluster - clusters are
> autonomous entities that holds resource / node information.
>     +In real practice, a helix client might need to access aggregated
> information of helix clusters from different data center regions for
> management or coordination purpose.
>     +This design proposes a service in Helix ecosystem for clients to
> retrieve cross-datacenter information in a more efficient way.
>     +
>     +
>     +## Problem Statement
>     +We identified a couple of use cases for accessing cross datacenter
> information. [Ambry](https://github.com/linkedin/ambry) is one of them.
>     +Here is a simplified example: some service has Helix cluster
> "MyDBCluster" in 3 data centers respectively, and each cluster has a
> resource named "MyDB".
>     +To federate this "MyDBCluster", current usage is to have each
> federation client (usually Helix spectator) to connect to metadata store
> endpoints in all fabrics to retrieve information and aggregate them locally.
>     +Such usge has the following drawbacks:
>     +
>     +* As there are a lot of clients in each DC that need cross-dc
> information, there are a lot of expensive cross-dc traffics
>     +* Every client needs to know information about metadata stores in all
> fabrics which
>     +  * Increases operational cost when these information changes
>     +  * Increases security concern by allowing cross data center traffic
>     +
>     +To solve the problem, we have the following requirements:
>     +* Clients should still be able to GET/WATCH aggregated information
> from 1 or more metadata stores (likely but not necessarily from different
> data centers)
>     +* Cross DC traffic should be minimized
>     +* Reduce amount of information about data center that a client needs
>     +* Agility of information aggregation can be configured
>     +* Currently, it's good enough to have only LiveInstance,
> InstanceConfig, and ExternalView aggregated
>     +
>     +
>     +
>     +
>     +
>     +## Proposed Design
>     +
>     +To provide aggregated cluster view, the solution I'm proposing is to
> add a special type of cluster, i.e. **View Cluster**.
>     +View cluster leverages current Helix semantics to store aggregated
> information of various **Source Clusters**.
>     +There will be another micro service (Helix View Aggregator) running,
> fetching information from clusters (likely from other data centers) to be
> aggregated, and store then to the view cluster.
>     --- End diff --
>
>     though setting up observer local to clients can potentially reduce
> cross data center traffic, but has a few draw backs:
>     1. all data changes will be propagated immediately, and if such
> information is not required frequently, there will be wasted traffic.
> Building a service makes it possible to customize aggregation granularity
>     2. Using zookeeper observer leaves aggregation logic to client -
> providing aggregated data will make it easier for user to consume
>     3. Building a service will leave space to customize aggregated data in
> the future, i.e. if we want to aggregate idea state, we might not need to
> aggregate preference list, etc
>
>     Will add these points into design doc
>
>
> ---
>

Reply via email to