GitHub user nareshv added a comment to the discussion: Thousands of cluster
geo-replication for fan-in aggregation ?
> There would be a lot of amplification of traffic to the target cluster
> "cAgg". The traffic throughput matters a lot and there is a concern how to
> scale things. This model wouldn't be scalable from a design perspective when
> thousands of partitions all aggregate to a single partition.
These c1..cN are pulsar-standalone instances running on a smaller footprint
devices (1core-1GBmem) with very few topics & very small byte rate (kb/sec) to
the `cAgg` cluster.
> I probably wouldn't use a global configuration store at all in such
> configurations.
When we create a tenant/namespace/topic with replication-clusters, it'd
internally use geo-config-store, right ?
If without geo-config store solution is possible, does it look something like
this. ?
1. deploy standalones on c1..cN
2. create c1..cN cluster names without geo-config-store
3. create cAgg cluster
4. create tenant/namespace/topic on all c1..cN, cAgg standalones
5. update the tenant/namespace replicationCluster on all c1..cN in such a way
that they replication cluster value is pair (c{x}, cAgg)
6. now producing messages to the c{x} cluster will replicate the data to cAgg ?
> If aggregation using geo-replication is really necessary, it would be
> recommended to have a sharded design so that there are multiple aggregation
> clusters where the final results are then aggregated possibly using a streams
> processing solution.
Any sweet spot for number of c1..cN grouping ? Online presentations talk about
upto 100 clusters geo-replication
GitHub link:
https://github.com/apache/pulsar/discussions/22438#discussioncomment-9018510
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]