kroeders opened a new issue #10294:
URL: https://github.com/apache/druid/issues/10294


   ### Motivation
   
   Given a query with lookups, one server with a missing lookup can cause query 
execution to fail. When the broker distributes a query to historicals and 
realtime servers, if any one of those servers does not have the lookup, the 
query fails as a whole. Lookups can fail to load for a number of reasons, such 
as missing firewall rules, drivers or slow loading times for large, frequently 
updated lookups. These queries could be served if the broker considered lookup 
status when selecting servers for querying. 
   
   To reproduce this issue, load the druid-lookups-cached-global and create a 
database backed lookup. Launch an additional historical without the database 
driver and the lookup will fail to load on that historical. Queries using the 
lookup will fail altogether because of the one historical without the lookup. 
   
   ### Proposed Changes
   
   The proposal is to modify the broker to track the lookup status on 
historical and realtime servers and avoid routing queries to servers where 
relevant lookups are not loaded. This can be done by making server selection 
aware of the query and excluding servers without required lookups. 
   
   #### Tracking Lookup Status in Broker
   
   The coordinator is responsible for tracking lookups and ensuring they are 
updated on query servers, so it has the information on which version has been 
successfully loaded on each node. This is available through the nodeStatus API. 
The broker can periodically poll the coordinator’s nodeStatus API and maintain 
a local cache of lookup status on each query server. 
   
   Alternatively, the broker could poll the internal listener API on the query 
servers, but this repeats work that the coordinator already does. Other 
transportation mechanisms like zookeeper could also be used or the coordinator 
could push the information to the brokers. 
   
   #### Avoiding Query Servers without Lookup
   
   CachingClusteredClient is responsible for determining which servers fulfil a 
query. The process is to retrieve a set of segment/server mapping relevant to 
the query and then use a strategy to select servers for each segment. Server 
selection is not aware of the query. When filtering segments in 
TierSelectorStrategy before applying the ServerSelector strategy, the query 
could be considered to avoid query servers without required lookups. Default 
methods can be added to avoid breaking existing implementations. 
   
   Alternatively, the pick interface on the ServerSelector interfaces could be 
extended to add a Query parameter and avoid servers without relevant lookups.  
Because this is an exceptional case, the servers could also be filtered in 
CachingClusteredClient before selection. Another alternative would involve 
handling the exception from the historical/realtime server and retrying the 
query for those segments. 
   
   #### Extracting Lookups from Queries
   
   Lookups specified as functions in SQL become virtual columns with a lookup 
expression or as the right join source for join queries. A new query runner 
could be added to extract the lookups, compare them with the servers and store 
this blocklist of servers in the query context. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to