GitHub user nareshv added a comment to the discussion: Thousands of cluster 
geo-replication for fan-in aggregation ?

> There would be a lot of amplification of traffic to the target cluster 
> "cAgg". The traffic throughput matters a lot and there is a concern how to 
> scale things. This model wouldn't be scalable from a design perspective when 
> thousands of partitions all aggregate to a single partition.

These c1..cN are pulsar-standalone instances running on a smaller footprint 
devices (1core-1GBmem) with very few topics & very small byte rate (kb/sec) to 
the `cAgg` cluster. 

> I probably wouldn't use a global configuration store at all in such 
> configurations.

When we create a tenant/namespace/topic with replication-clusters, it'd 
internally use geo-config-store, right ?

If without geo-config store solution is possible, does it look something like 
this. ?

1. deploy standalones on c1..cN 
2. create c1..cN cluster names without geo-config-store
3. create cAgg cluster
4. create tenant/namespace/topic on all c1..cN, cAgg standalones
5. update the tenant/namespace replicationCluster on all c1..cN in such a way 
that they replication cluster value is pair (c{x}, cAgg)
6. now producing messages to the c{x} cluster will replicate the data to cAgg ?

> If aggregation using geo-replication is really necessary, it would be 
> recommended to have a sharded design so that there are multiple aggregation 
> clusters where the final results are then aggregated possibly using a streams 
> processing solution.

Any sweet spot for number of c1..cN grouping ? Online presentations talk about 
upto 100 clusters geo-replication

GitHub link: 
https://github.com/apache/pulsar/discussions/22438#discussioncomment-9018510

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to