gianm commented on a change in pull request #8116: remove unnecessary lock in ForegroundCachePopulator leading to a lot of contention URL: https://github.com/apache/incubator-druid/pull/8116#discussion_r306041975
########## File path: docs/content/configuration/index.md ########## @@ -1176,6 +1176,8 @@ You can optionally configure caching to be enabled on the peons by setting cachi |`druid.realtime.cache.useCache`|true, false|Enable the cache on the realtime.|false| |`druid.realtime.cache.populateCache`|true, false|Populate the cache on the realtime.|false| |`druid.realtime.cache.unCacheable`|All druid query types|All query types to not cache.|`["groupBy", "select"]`| +|`druid.realtime.cache.numBackgroundThreads`|If greater than 0, cache will be populated in the background thread pool of the configured size. By default cache is populated in the foreground, which can more efficiently handle reaching `maxEntrySize` than when done in the background. Note that there is no load shedding for background cache population, so it can also lead to out of memory scenarios depending on background threadpool utilization.|0| Review comment: > What does it mean for cache "to be populated in the background" and "to be populated in the foreground"? Could you please add a new doc page explaining these concepts and refer to it from all these configuration parameter descriptions? IMO it's better to try to use the space here in this page to explain the concepts that a user needs to understand. It doesn't seem complicated enough that a new page is needed. > It doesn't seem to me that maxEntrySize has any relation to background/foreground. It's a size of one entry, how can the verb "reach" be applied to it? Each cache entry can be potentially built from multiple objects, and can potentially be built up incrementally. Imagine a cache entry that is derived from a `Sequence<Row>` of 100 rows, and `maxEntrySize` is "reached" after the first 40 rows. > Please reword. Here is my attempt: > If equal to zero, query cache is populated synchronously using the same thread that generated the cacheable data in the first place. This will generally be the query processing pool on data processes (like Historicals) or the HTTP server thread pool on Brokers. If greater than 0, query cache will be populated asynchronously using a thread pool of this size. The thread pool will be dedicated to query cache population, and will not be used for any other purpose. Note that if cacheable data is generated faster than it can be written to the cache, background population can lead the JVM to run out of memory. I removed the bit about `maxEntrySize` since I think it's too inside-baseball (it's referring to the foreground populator being able to 'notice' an oversized entry earlier, which is somewhat more efficient, but it doesn't seem worth mentioning in user-facing docs). By the way, as mentioned in https://github.com/apache/incubator-druid/pull/8116#discussion_r306037681, it might be a good idea to split the doc changes off into their own PR. They're only tangentially related to the patch's original purpose. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
