gianm commented on a change in pull request #8116: remove unnecessary lock in 
ForegroundCachePopulator leading to a lot of contention
URL: https://github.com/apache/incubator-druid/pull/8116#discussion_r306041975
 
 

 ##########
 File path: docs/content/configuration/index.md
 ##########
 @@ -1176,6 +1176,8 @@ You can optionally configure caching to be enabled on 
the peons by setting cachi
 |`druid.realtime.cache.useCache`|true, false|Enable the cache on the 
realtime.|false|
 |`druid.realtime.cache.populateCache`|true, false|Populate the cache on the 
realtime.|false|
 |`druid.realtime.cache.unCacheable`|All druid query types|All query types to 
not cache.|`["groupBy", "select"]`|
+|`druid.realtime.cache.numBackgroundThreads`|If greater than 0, cache will be 
populated in the background thread pool of the configured size. By default 
cache is populated in the foreground, which can more efficiently handle 
reaching `maxEntrySize` than when done in the background. Note that there is no 
load shedding for background cache population, so it can also lead to out of 
memory scenarios depending on background threadpool utilization.|0|
 
 Review comment:
   > What does it mean for cache "to be populated in the background" and "to be 
populated in the foreground"? Could you please add a new doc page explaining 
these concepts and refer to it from all these configuration parameter 
descriptions?
   
   IMO it's better to try to use the space here in this page to explain the 
concepts that a user needs to understand. It doesn't seem complicated enough 
that a new page is needed.
   
   > It doesn't seem to me that maxEntrySize has any relation to 
background/foreground. It's a size of one entry, how can the verb "reach" be 
applied to it?
   
   Each cache entry can be potentially built from multiple objects, and can 
potentially be built up incrementally. Imagine a cache entry that is derived 
from a `Sequence<Row>` of 100 rows, and `maxEntrySize` is "reached" after the 
first 40 rows.
   
   > Please reword.
   
   Here is my attempt:
   
   > If equal to zero, query cache is populated synchronously using the same 
thread that generated the cacheable data in the first place. This will 
generally be the query processing pool on data processes (like Historicals) or 
the HTTP server thread pool on Brokers. If greater than 0, query cache will be 
populated asynchronously using a thread pool of this size. The thread pool will 
be dedicated to query cache population, and will not be used for any other 
purpose. Note that if cacheable data is generated faster than it can be written 
to the cache, background population can lead the JVM to run out of memory.
   
   I removed the bit about `maxEntrySize` since I think it's too 
inside-baseball (it's referring to the foreground populator being able to 
'notice' an oversized entry earlier, which is somewhat more efficient, but it 
doesn't seem worth mentioning in user-facing docs).
   
   By the way, as mentioned in 
https://github.com/apache/incubator-druid/pull/8116#discussion_r306037681, it 
might be a good idea to split the doc changes off into their own PR. They're 
only tangentially related to the patch's original purpose.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to