GitHub user lhotari added a comment to the discussion: Thousands of cluster geo-replication for fan-in aggregation ?
> Does pulsar support such a model ? Yes. However "support" is perhaps not the correct word here since this would be a very extreme use case if there are 10K clusters within geo-replication. > What are the scalability concerns to be worried about ? There would be a lot of amplification of traffic to the target cluster "cAgg". The traffic throughput matters a lot and there is a concern how to scale things. This model wouldn't be scalable from a design perspective when thousands of partitions all aggregate to a single partition. > Any impact on the topic-stats api or admin-api as it lists all replications ? That's probably not a major concern. However, it could be unmanageable with thousands of replications. > Any impact on the geo-config-store ? I probably wouldn't use a global configuration store at all in such configurations. > Any other considerations for implementing this model ? I don't have the context of what the use case is and what the volumes are. Based on the provided information, I'd put more focus on why the aggregation is needed and how to find a scalable design for aggregation. Perhaps the aggregation is a streams processing problem and could be handled with multiple levels of aggregation, implemented with Flink and it's Pulsar connector? If aggregation using geo-replication is really necessary, it would be recommended to have a sharded design so that there are multiple aggregation clusters where the final results are then aggregated possibly using a streams processing solution. There are also other types of solutions for aggregation that are compatible with Pulsar. For example, StreamNative has announced a "Streaming Lakehouse" product "Lakehouse Tiered Storage for Pulsar". More details in [video](https://streamnative.io/videos/streaming-data-into-your-lakehouse-introducing-pulsars-lakehouse-tiered-storage) and [blog post](https://streamnative.io/blog/streaming-lakehouse-introducing-pulsars-lakehouse-tiered-storage). This opens up completely new possibilities for aggregating the results and saving on costs. GitHub link: https://github.com/apache/pulsar/discussions/22438#discussioncomment-9017514 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
