599166320 opened a new issue, #14535: URL: https://github.com/apache/druid/issues/14535
### Motivation Data centers may be located in different regions and provided by different cloud providers. Usually, dedicated network connections are used to interconnect the networks of these data centers, facilitating data transmission. However, transferring data between data centers incurs expensive bandwidth costs and significant transmission delays. Therefore, it is essential to minimize data transfer between different data centers as much as possible. Rather than centralizing data storage in a single data center for all regions, we prefer to report data to the nearest data center. However, this approach poses certain challenges. If there are a large number of data centers, maintaining numerous independent Druid clusters becomes necessary. As these clusters operate independently, it can result in inaccurate aggregated queries. By enabling support for federated clusters in Druid databases, substantial cost savings in terms of bandwidth can be achieved. ### Description Here is one possible deployment architecture:  To enable federated queries, the "federatedClusterBrokers" property should be added to the context. Here is an example query: ``` curl 'http://localhost:8888/druid/v2/sql' \ -H 'sec-ch-ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"' \ -H 'Accept: application/json, text/plain, */*' \ -H 'Content-Type: application/json' \ -H 'Referer: http://localhost:8888/unified-console.html' \ --data-raw '{"query":"SELECT COUNT(*),APPROX_QUANTILE_DS(added, 0.95) \"P95cdnTime\",page FROM wikipedia GROUP BY page ORDER BY 1 DESC limit 10","resultFormat":"array","header":true,"typesHeader":true,"sqlTypesHeader":true,"context":{"sqlOuterLimit":1001,"federatedClusterBrokers":"sanFrancisco:8888,hongKong:8888","sqlQueryId":"7e7c37e7-1cd4-49c2-9a20-8950350b7997"}}' ``` The above is the current approach I am using to implement a federated cluster with Druid. Are there any better ways to support federated queries in Druid clusters? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
