Re: [PR] Do not block task actions on Overlord if segment metadata cache is syncing (druid)

via GitHub Sat, 05 Apr 2025 10:55:21 -0700


gianm commented on code in PR #17824:
URL: https://github.com/apache/druid/pull/17824#discussion_r2017286818



##########
server/src/main/java/org/apache/druid/metadata/segment/cache/HeapMemorySegmentMetadataCache.java:
##########
@@ -71,10 +71,20 @@
 /**
  * In-memory implementation of {@link SegmentMetadataCache}.
  * <p>
- * Reads can be done from the cache only after it has synced with the metadata 
store.
- * Writes can happen even when the sync is in progress. This is because the
- * {@link #syncWithMetadataStore()} is able to reconcile differences based on
- * the update time of a transaction.
+ * Only used segments (excluding num_rows and schema_fingerpring) and

Review Comment:
   `schema_fingerprint` (spelling)



##########
docs/configuration/index.md:
##########
@@ -1175,8 +1175,8 @@ The following properties pertain to segment metadata 
caching on the Overlord tha
 
 |Property|Description|Default|
 |--------|-----------|-------|
-|`druid.manager.segments.useCache`|If `true`, segment metadata is cached on 
the Overlord to speed up metadata operations.|false|
-|`druid.manager.segments.pollDuration`|Duration (in ISO 8601 format) between 
successive syncs of the cache with the metadata store. This property is used 
only when `druid.manager.segments.useCache` is `true`.|`PT1M` (1 minute)|
+|`druid.manager.segments.useCache`|Denotes the usage mode of the segment 
metadata cache. Possible modes are: (a) `NEVER`: Cache is disable (b) `ALWAYS`: 
Reads are always done from the cache. Service start-up may be blocked until 
cache has synced with the metadata store at least once. Transactions may block 
until cache has synced with the metadata store at least once after becoming 
leader. (c) `IF_SYNCED`: Reads are done from the cache only if it has already 
synced with the metadata store. This mode does not block service start-up or 
transactions.|`NEVER`|

Review Comment:
   Suggested edits:
   
   - "Cache is disabled" (spelling)
   - "Service start-up will be blocked" (I believe this first sync will always 
block service startup, so "will" is more accurate)
   - "Transactions will block" (same reason)
   - IMO, most runtime property values in Druid are camelCase, so I'd go with 
`never`, `always`, and `ifSynced`. For an example of a way to do that with an 
enum, check out `OutputChannelMode`.



##########
server/src/main/java/org/apache/druid/metadata/segment/cache/HeapMemorySegmentMetadataCache.java:
##########
@@ -71,10 +71,20 @@
 /**
  * In-memory implementation of {@link SegmentMetadataCache}.
  * <p>
- * Reads can be done from the cache only after it has synced with the metadata 
store.
- * Writes can happen even when the sync is in progress. This is because the
- * {@link #syncWithMetadataStore()} is able to reconcile differences based on
- * the update time of a transaction.
+ * Only used segments (excluding num_rows and schema_fingerpring) and
+ * pending segments are cached. Unused segments are not cached.
+ * <p>
+ * Non-leader Overlords also keep polling the metadata store to keep the cache
+ * up-to-date in case leadership changes.
+ * <p>
+ * Cache usage modes: {@link UsageMode}:

Review Comment:
   Is this line incomplete? It seems to be missing something.



##########
server/src/main/java/org/apache/druid/metadata/segment/cache/HeapMemorySegmentMetadataCache.java:
##########
@@ -71,10 +71,20 @@
 /**
  * In-memory implementation of {@link SegmentMetadataCache}.
  * <p>
- * Reads can be done from the cache only after it has synced with the metadata 
store.
- * Writes can happen even when the sync is in progress. This is because the
- * {@link #syncWithMetadataStore()} is able to reconcile differences based on
- * the update time of a transaction.
+ * Only used segments (excluding num_rows and schema_fingerpring) and
+ * pending segments are cached. Unused segments are not cached.
+ * <p>
+ * Non-leader Overlords also keep polling the metadata store to keep the cache
+ * up-to-date in case leadership changes.
+ * <p>
+ * Cache usage modes: {@link UsageMode}:
+ * <p>
+ * The map {@link #datasourceToSegmentCache} contains the cache for each 
datasource.

Review Comment:
   This seems undesirable, since it means that in a situation where there are a 
lot of datasources being cycled through existing and not-existing (perhaps a 
cluster being used for testing purposes), the OL will eventually run out of 
memory. Could it be solved by making `datasourceToSegmentCache` a regular map 
(not concurrent map) and protecting fetches + cleanups with that lock? With 
that approach, the outer lock should be held very briefly, just long enough to 
do the accounting and transfer to the inner lock.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Do not block task actions on Overlord if segment metadata cache is syncing (druid)

Reply via email to