[I] support for federated clusters (druid)

via GitHub Thu, 06 Jul 2023 07:53:11 -0700


599166320 opened a new issue, #14535:
URL: https://github.com/apache/druid/issues/14535


   ### Motivation
   
   Data centers may be located in different regions and provided by different 
cloud providers. Usually, dedicated network connections are used to 
interconnect the networks of these data centers, facilitating data 
transmission. However, transferring data between data centers incurs expensive 
bandwidth costs and significant transmission delays. Therefore, it is essential 
to minimize data transfer between different data centers as much as possible. 
Rather than centralizing data storage in a single data center for all regions, 
we prefer to report data to the nearest data center. However, this approach 
poses certain challenges. If there are a large number of data centers, 
maintaining numerous independent Druid clusters becomes necessary. As these 
clusters operate independently, it can result in inaccurate aggregated queries. 
By enabling support for federated clusters in Druid databases, substantial cost 
savings in terms of bandwidth can be achieved.
   
   ### Description
   
   Here is one possible deployment architecture:
   
   
![image](https://github.com/apache/druid/assets/3204398/e340ad06-30bc-4f8d-b1ed-bc783086b913)
   
   
   To enable federated queries, the "federatedClusterBrokers" property should 
be added to the context. Here is an example query:
   ```
   curl 'http://localhost:8888/druid/v2/sql' \
     -H 'sec-ch-ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google 
Chrome";v="114"' \
     -H 'Accept: application/json, text/plain, */*' \
     -H 'Content-Type: application/json' \
     -H 'Referer: http://localhost:8888/unified-console.html' \
     --data-raw '{"query":"SELECT COUNT(*),APPROX_QUANTILE_DS(added, 0.95) 
\"P95cdnTime\",page FROM wikipedia GROUP BY page ORDER BY 1 DESC limit 
10","resultFormat":"array","header":true,"typesHeader":true,"sqlTypesHeader":true,"context":{"sqlOuterLimit":1001,"federatedClusterBrokers":"sanFrancisco:8888,hongKong:8888","sqlQueryId":"7e7c37e7-1cd4-49c2-9a20-8950350b7997"}}'
 
   ```
   
   The above is the current approach I am using to implement a federated 
cluster with Druid. Are there any better ways to support federated queries in 
Druid clusters?
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] support for federated clusters (druid)

Reply via email to