paul-rogers edited a comment on pull request #11989:
URL: https://github.com/apache/druid/pull/11989#issuecomment-982054041


   @yuanlihan, cool feature! Here is one question you might want to explain: 
what is that use case that will be improved by this change?
   
   The feature would seem to optimize queries that hit exactly the same data 
every time. How common is this? It might solve a use case that you have, say, 
100 dashboards, each of which will issue the same query. One of them runs the 
real query, 99 fetch from the cache.
   
   Here's a similar use case I've seen many times, but would *not* benefit from 
a post-merge cache: a dashboard wants to show the "last 10 minutes" of data, 
and we have minute-grain cubes (segments). Every query shifts the time range by 
one minute which will add a bit more new data, and exclude a bit of old data. 
With this design, we cache the post-merge data, so each shift in the time 
window will result in a new result set to cache.
   
   Can this feature instead cache pre-merge results? That way, we cache each 
minute slice once. The merge is still needed to gather up the slices required 
by the query. But, that merge is in-memory and should be pretty fast. The 
result would be, in this example, caching 1/10 the amount of data compared to 
caching post-merge data. And, less load on the historicals since we don't hit 
them each time the query time range shifts forward one minute.
   
   In short, please explain your use case a bit more so we an understand the 
goal of this enhancement.
   
   Edit: I'm told we may have some of the above. Historicals can already cache 
per-segment results. In a system with large numbers of segments, and where the 
query hits a good percentage of those, I'm told the merge cost can be high. Is 
it the merge cost that this PR seeks to avoid?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to