Re: [PR] Invalidate cache load status on historical as soon as it is served (druid)

via GitHub Mon, 24 Jul 2023 22:24:33 -0700


kfaraz commented on code in PR #14616:
URL: https://github.com/apache/druid/pull/14616#discussion_r1273012308



##########
server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java:
##########
@@ -90,19 +93,46 @@ public class SegmentLoadDropHandler implements 
DataSegmentChangeHandler
   private final SegmentManager segmentManager;
   private final ScheduledExecutorService exec;
   private final ServerTypeConfig serverTypeConfig;
-  private final ConcurrentSkipListSet<DataSegment> segmentsToDelete;
+  private final ConcurrentSkipListSet<DataSegment> segmentsToDrop;
   private final SegmentCacheManager segmentCacheManager;
 
   private volatile boolean started = false;
 
-  // Keep history of load/drop request status in a LRU cache to maintain 
idempotency if same request shows up
-  // again and to return status of a completed request. Maximum size of this 
cache must be significantly greater
-  // than number of pending load/drop requests. so that history is not lost 
too quickly.
-  private final Cache<DataSegmentChangeRequest, AtomicReference<Status>> 
requestStatuses;
+  /**
+   * Used to cache the status of a completed load or drop request until it has
+   * been served to the (Coordinator) client exactly once.
+   * <p>
+   * The cache is used as follows:
+   * <ol>
+   * <li>An entry with state PENDING is added to the cache upon receiving a
+   * request to load or drop a segment.</li>
+   * <li>A duplicate request received at this point is immediately answered 
with PENDING.</li>
+   * <li>Once the load/drop finishes, the entry is updated to either SUCCESS 
or FAILED.</li>
+   * <li>A duplicate request received at this point is immediately answered 
with
+   * SUCCESS or FAILED and the entry is removed from the cache.</li>
+   * <li>If the first request itself finishes after the load or drop has 
already
+   * completed, it is answered with a SUCCESS or FAILED and the entry is 
removed
+   * from the cache.</li>
+   * </ol>
+   * <p>
+   * Maximum size of this cache must be significantly greater than the number 
of

Review Comment:
   If the Coordinator sends a batch larger than the cache size, the Historical 
would not be able to have an entry for some of the requests in the cache. These 
requests would still be processed but their finished statuses would never be 
updated in the cache and thus never sent back to the Coordinator. 
   
   The Coordinator would end up retrying these requests and the Historical 
might duplicate some work but there would never be any correctness issue, since 
we are not sending any response to the Coordinator for the un-cached requests.
   
   Given that the default cache size is 100 and default request batch size is 
1, we should be okay. We can update the docs to mention that the cache size 
must be at least 10x of the batch size.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Invalidate cache load status on historical as soon as it is served (druid)

Reply via email to