himanshug commented on issue #6834: [Proposal] Add published segment cache in 
broker
URL: 
https://github.com/apache/incubator-druid/issues/6834#issuecomment-507862581
 
 
   > In particular, we had thought about moving the entire sys schema 
implementation to the Coordinator and having the Broker send any SQL queries on 
`sys` over there. Users could also query sys tables on the Coordinator directly 
if they wanted.
   
   I like that because, as a user I would like to use the feature introduced by 
`sys` table but it would be nice if it wasn't at the expense of each broker 
needing whole bunch of extra memory that I would like to save for real data 
queries.
   Regarding the counter arguments ...
   
   > It is a somewhat common request from our users to add an option to the 
Broker to either fail a query, or provide a ....
   
   We have a slightly different feature already available. Query context key, 
"uncoveredIntervalsLimit" that can be used in the query to return any intervals 
not covered by segments that we used to process query. this adds a header in 
response and user can discard the results. I think it was documented at some 
point in Query Contexts doc. This should work for many users.
   However it is not exactly what you pointed. For that, a "cache" wouldn't be 
enough because it could be stale and we wouldn't be able to guarantee whether 
results are really partial or not. But, I get it that "good enough" might be 
good enough.
   
   > #6319 contemplates a design for finer-grained loc ...
   
   sorry, haven't gone through it yet, so don't understand it.
   
   In any case, both counter arguments are basically saying that we need the 
cache at broker for other reasons. I would propose that we don't make cache at 
broker a prerequisite for `sys` table functionality .. other features might 
need the cache when there is no other way but me as a user would like to use 
`sys` without incurring extra memory at broker if possible. 
   
   That said, if we do decide otherwise then I am fine with Broker getting 
information from Coordinator instead of directly going to DB for reasons that 
@gianm mentioned. General expectation from cluster is that data queries should 
continue to work in case of node failures as much possible not that all 
features need to work . I wouldn't worry about coordinator being down leading 
to Broker not having up-to-date response for `sys` table queries , when 
coordinator is down then new segments are not loaded on historicals (and many 
other big problems) happen so "all coordinators down" is a pretty bad situation 
anyway which cluster operators would want to resolve as quickly as possible.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to